Publications by Ken Wood
Modeling and Prediction for Movies
Setup Load packages library(ggplot2) library(dplyr) library(statsr) library(plotly) library(GGally) Introduction Congratulations on getting a job as a data scientist at Paramount Pictures! Your boss has just acquired data about how much audiences and critics like movies as well as numerous other variables about the movies. This dataset is provi...
5482 sym R (10849 sym/21 pcs) 1 img
Multiple Linear Regression
Grading the professor Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related charac...
12915 sym R (9659 sym/46 pcs) 7 img 1 tbl
Intro to Linear Regression
Batter up The movie Moneyball focuses on the “quest for the secret of success in baseball”. It follows a low-budget team, the Oakland Athletics, who believed that underused statistics, such as a player’s ability to get on base, better predict the ability to score runs than typical statistics like home runs, RBIs (runs batted in), and battin...
11975 sym R (7614 sym/39 pcs) 5 img
Statistical Inference with GSS Data
Setup Load packages library(ggplot2) library(dplyr) library(statsr) library(plotly) Load data and clean by removing columns where ‘NA’ values comprise more than 10% of total rows. We will also delete the caseid column since it serves no useful purpose in our analysis. load("gss.RData") gss_filtered <- gss[,colSums(is.na(gss)) <= 0.1*nrow(gss...
4199 sym R (2794 sym/11 pcs) 2 img
Foundations for inference - Confidence intervals
Complete all Exercises, and submit answers to Questions on the Coursera platform. If you have access to data on an entire population, say the size of every house in Ames, Iowa, it’s straight forward to answer questions like, “How big is the typical house in Ames?” and “How much variation is there in sizes of houses?”. If you have acces...
8641 sym R (2661 sym/24 pcs) 1 img
Intro to Probability & Data in R: Data Analysis Project
Introduction The Behavioral Risk Factor Surveillance System (BRFSS) is a collaborative project between all of the states in the United States (US) and participating US territories and the Centers for Disease Control and Prevention (CDC). The BRFSS is administered and supported by CDC’s Population Health Surveillance Branch, under the Division o...
8466 sym R (7074 sym/23 pcs)
Intro to Probability & Data in R: Data Analysis Project
Introduction The Behavioral Risk Factor Surveillance System (BRFSS) is a collaborative project between all of the states in the United States (US) and participating US territories and the Centers for Disease Control and Prevention (CDC). The BRFSS is administered and supported by CDC’s Population Health Surveillance Branch, under the Division o...
5610 sym R (54 sym/2 pcs)
Intro to R and RStudio
Complete all Exercises, and submit answers to Questions on the Coursera platform. The goal of this lab is to introduce you to R and RStudio, which you’ll be using throughout the course both to learn the statistical concepts discussed in the course and to analyze real data and come to informed conclusions. To straighten out which is which: R is...
17745 sym R (2404 sym/29 pcs) 3 img
Statistical Inference for Numerical Data
Getting Started Load packages In this lab we will explore the data using the dplyr package and visualize it using the ggplot2 package for data visualization. The data can be found in the companion package for this course, statsr. Let’s load the packages. library(statsr) library(dplyr) library(ggplot2) library(plotly) The data In 2004, the sta...
8024 sym R (5023 sym/27 pcs) 6 img 1 tbl
Posterior Probabilities
Background Some people refer to slot machines as “One-armed Bandits” due to the older style of machine requiring the player to pull a mechanical handle to play. Statisticians and mathematicians often develop theories / models based on games of chance which turn out to be more generally useful. One general class of probability / optimization p...
11338 sym R (1706 sym/16 pcs) 1 img