Publications by Torey Tonche

STA 6543 - Homework 2

26.02.2021

Problem 2 A key difference between the KNN classifier and KNN regression methods are that the type of response variable each one is used for. KNN classifier is used for qualitative response variables where as the KNN regression method is used for quantitative response variables. Problem 9 9a pairs(auto_data) 9b cor.matrix = cor(auto_data[,-9])...

11557 sym R (1179 sym/12 pcs)

STA 6543 - Homework 5

02.04.2021

Problem 2 For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer. (a) The lasso, relative to least squares, is: iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. This is because as \(\lambda\) increases, the flexibility of the l...

16617 sym R (11594 sym/58 pcs) 8 img

STA 6543 - Homework 4

27.03.2021

Problem 3 We now review k-fold cross-validation. (a) Explain how k-fold cross-validation is implemented. The data is randomly divided into k groups. Ideally these groups are all the same size, but approximately the same will suffice. A model is then fitted k times using one of the k-folds (subsets) as the validation set and remaining subsets comp...

22109 sym R (5792 sym/37 pcs)

STA 6543 - Homework 3

06.03.2021

10 10a Volume and Year have a strong positive correlation. Meaning over time, the average daily volume of shares traded (billions) has increased year to year. Additionally, the relationship seems to be non-linear (exponential perhaps). Examination of the Direction variable shows that 484 weeks of the data showed a positive return and 605 weeks s...

28582 sym R (24905 sym/92 pcs) 5 img

STA 6543 - Homework 8

04.05.2021

Problem 5 We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features. 5a Generate a data set with n = 500 and p = 2, s...

27489 sym R (10286 sym/68 pcs) 10 img 1 tbl

STA 6543 - Homework 6

23.04.2021

Problem 6 In this exercise, you will further analyze the Wage data set considered throughout this chapter. 6a Perform polynomial regression to predict wage using age. Use cross-validation to select the optimal degree d for the polynomial. What degree was chosen, and how does this compare to the results of hypothesis testing using ANOVA? Make a ...

14603 sym R (6683 sym/18 pcs) 5 img

STA 6543 - Homework 7

30.04.2021

Problem 3 p1 = seq(0, 1, .01) p2 = 1 - p1 gini = p1 * p2 + p2 * (1 - p2) # 1 - p2 = p1, so we could just write 2 * p1 * p2 for simplicity entropy = -(p1 * log(p1) + p2 * log(p2)) class_error = 1 - pmax(p1, p2) metrics_df = data.frame(p1, p2, gini, entropy, class_error) %>% pivot_longer(cols = c(gini, entropy, class_error...

17805 sym R (8012 sym/36 pcs) 5 img