Publications by Harris Wohl
Tooth Length Analysis
Brief Overview of the Data The description of the ToothGrowth dataset in R Documentation is as follows: The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (...
2484 sym R (2078 sym/13 pcs) 2 img
Homework 6
7.4) The file agehw.dat contains data on the ages of 100 married couples sampled from the U.S. population. ### a) Formulate a semiconjugate prior distribution for the mean husband and wife ages \(\theta\) = \((\theta_h, \theta_w)^T\) and covariance matrix \(\Sigma\). agehw <- read.table("./agehw.dat", header = T) ybar <- apply(agehw, 2, mean) y...
3905 sym R (7091 sym/39 pcs) 13 img
US Mask Data Analysis
Introduction Wherever I look, I can’t seem to escape the narrative of a widespread “anti-mask” movement. This informal analysis sets out to shed some light on the following questions: 1) Is the open refusal to wear masks actually widespread, or is this being overblown? 2) If this variation in mask adoption does exist, is it correlated wit...
7919 sym R (28000 sym/43 pcs) 5 img
HW 1
library(ISLR) library(MASS) Conceptual Exercises 1) Which method would be better: flexible or inflexible? a) A flexible method would work better in this case since the sample size is so large. Additionally, since the number of predictors p is small, the model might still be interpretable. b) For the opposite case, an inflexible method might ...
6935 sym R (9375 sym/48 pcs) 7 img
HW 3
2) 4) a) a) Since the predictor is uniformly distributed, 10 percent of available observations will be used to make each prediction. b) In the case where p = 2, 1/10 of the X1 observations will be used, and 1/10 of the X2 observations will be used to make each prediction. If we think of this criteria visually as a “box”, the area of such ...
6401 sym R (6175 sym/49 pcs) 1 img
HW 2
library(MASS) library(ISLR) library(car) Conceptual Exercises 3) a) iii If we fix IQ and GPA, the model for males is the following: salary = 50 + 20 * gpa + .07 * iq + .01 * (gpa * iq) and the model for females is the following: salary = 85 + 10 * gpa + .07 * iq + .01 * (gpa * iq) So, females have an intercept of 85 and males have an inte...
8978 sym R (29018 sym/95 pcs) 19 img
HW 7
Conceptual Exercises 2 and 4) These were handwritten and turned in seperately. 5) For the majority vote method, we would choose red since 6/10 estimates were greater than .5. mean(0.1, 0.15, 0.2, 0.2, 0.55, 0.6, 0.6, 0.65, 0.7, 0.75) ## [1] 0.1 For the average probability approach, we would choose green since the average of the bootstrapped est...
4178 sym R (10578 sym/62 pcs) 13 img
HW 5
Conceptual Exercises 2) a) Lasso is less flexible relative to least squares since the least squares model contains a coefficient estimate for every predictor input into the model, whereas lasso only contains coefficient estimates for a subset of the predictors, and shrinks the coefficients based on lambda. Since it is less flexible, it will ha...
5510 sym R (14292 sym/62 pcs) 8 img
HW 4
2) Chapter 5 Conceptual Exercise 4 Use bootstrapping: take n samples of paired \((X_i, Y_i)\) from the original dataset (with replacement), fit the statistical learning model on the new dataset, and predict Y using the value of X given in the problem. Repeat this process a large number of times, which should give a distribution of predictions for...
3143 sym R (4655 sym/34 pcs) 9 img
Final Project Technical Report
Introduction The goal of this analysis is to build a model that predicts house prices, and identifies the most important factors in house value. To me, this meant that the real estate company is mainly interested in a good prediction model, and would value predictive power over interpretability and inference. For this reason, my final prediction ...
7406 sym