Publications by Tim F.
STA6543-Assignment 2
Question 2 The difference between KNN classifier and KNN regression methods are as follows: -KNN classifier methods are more typically used when working with categorical data where classification is of greater importance to the study. KNN classifier methods choose the \(k\) nearest neighbors to a given \(x_0\). A conditional probability is then c...
4005 sym R (9092 sym/36 pcs) 16 img 3 tbl
Chapter 9- Support Vector Machines
Question 5 Part A set.seed(8) x1 = runif(500) - 0.5 x2 = runif(500) - 0.5 y = 1*(x1^2-x2^2 > 0) Part B df <- tibble(x1, x2, y) colors <- c("red", "blue") ggplot(df, aes(x1, x2, color = y)) + geom_point() + scale_color_gradient(low = "red", high = "blue") Part C glm_fit <-glm(y~., family = "binomial", data = df) summary(glm_f...
3229 sym R (18722 sym/111 pcs) 27 img
STA6543-Assignment 7 (Tree Based Methods)
Question 3 prob = seq(0, 1, 0.001) gini = tibble(y = (prob * (1 - prob) * 2)) %>% mutate(x = row_number()/1000) entropy = tibble(y = (-(prob * log(prob) + (1 - prob) * log(1- prob)))) %>% mutate(x = row_number()/1000) class_error = tibble(y = (1 - pmax(prob, 1 - prob))) %>% mutate(x = row_number()/1000) ggplot() + geom_lin...
3288 sym R (16946 sym/47 pcs) 7 img
STA6543-Assignment 5
Question 2 Part A Solution iii would be the most appropriate here since the lasso method inherently limits the number of predictors present in the model and therefore reduces variance while increasing bias. Part B Solution iii would be the most appropriate here since the ridge regression method will “shrink” the number of predictors if they...
3773 sym R (22713 sym/75 pcs) 5 img
Chapter 7 - Moving Beyond Linearity
Question 6 Part A set.seed(8) wage <- Wage cv_error <- rep(0,8) for (i in 1:8) { glm_fit <- glm(wage~poly(age,i), data = wage) cv_error[i] <- cv.glm(wage, glm_fit, K = 8)$delta[1] } cv_error ## [1] 1676.442 1600.397 1594.922 1595.631 1595.172 1594.701 1595.001 1597.522 plot(cv_error, type = "b", main = "Polynomial Error Chec...
1771 sym R (10327 sym/25 pcs) 6 img
Homework 3 (Logistic Regression)
Question 10 Part A weekly <- Weekly ?Weekly ## starting httpd help server ... done # SKIM DATASET, LOOK AT VARIABLES skimr::skim(weekly) Data summary Name weekly Number of rows 1089 Number of columns 9 _______________________ Column type frequency: factor 1 numeric 8 ________________________ Group variables None Variable type: fac...
8281 sym R (18189 sym/312 pcs) 7 img
Homework 4 (Resampling Methods)
Question 3 Part A k-fold cross validation is implemented by randomly dividing the set of observations into k folds, or groups, of approximately equal size. The first fold of observations set aside in the procedure is used as a validation set and not included in fitting while the remaining k - 1 folds are used to fit the model. The first mean squ...
3396 sym R (6966 sym/49 pcs)