Publications by Brett Stokes
Case Study 1
R Markdown Data1: Facebook Metrics - case study1 data fbook=read.csv(“dataset_Facebook.csv”, header=TRUE) library(mice) ## Warning: package 'mice' was built under R version 4.0.3 ## ## Attaching package: 'mice' ## The following object is masked from 'package:stats': ## ## filter ## The following objects are masked from 'package:base...
669 sym R (47840 sym/102 pcs) 15 img
HW2
Q2 Carefully explain the differences between the KNN classifier and the KNN regression methods. The KNN regression method is very similar to the KNN classifier. For a particular x0 and K, the KNN regression method will identify training observations with similar x values and will give an average continuous response value (y-hat). On the other ha...
5991 sym R (13437 sym/42 pcs) 6 img
BM Case Study
Loading in bank-additional.csv, getting rid of missing values, removing duration column, assigning categorical variables as factors, recoding pdays, and assigning a train test split, removed a row that contained illiterate value in education (3927), removed a row that contained yes in default column (3515). knitr::opts_chunk$set(echo = TRUE) lib...
783 sym R (52977 sym/117 pcs) 16 img
CRCS
Part One: Predict acquisition Train Test Split set.seed(100) data(acquisitionRetention) data1 = acquisitionRetention index = sample(nrow(data1),0.8*nrow(data1)) train = data1[index,] test = data1[-index,] Cleaning Data anyNA(data1) model <- glm(acquisition ~acq_exp + industry + revenue + employees, data = train, family = binomial) vif...
3127 sym R (35459 sym/49 pcs) 7 img 2 tbl
DJ Case Study
Load Data, Set Up Lag Variables, Train Test Split, Lag Plots #load data dow=read.table("dow_jones_index.data", sep = ',', header=TRUE) dow=na.omit(dow) dow$date <- as.Date(dow$date, format = "%m/%d/%y") dow$open = as.numeric(gsub("[\\$,]", "", dow$open)) dow$high = as.numeric(gsub("[\\$,]", "", dow$high)) dow$low = as.numeric(gsub("[\\$,]",...
1218 sym R (15832 sym/27 pcs) 4 img
Case Study 2
library(beepr) library(pROC) ## Type 'citation("pROC")' for a citation. ## ## Attaching package: 'pROC' ## The following objects are masked from 'package:stats': ## ## cov, smooth, var library(InformationValue) library(formula.tools) library(knitr) setwd("C:/Users/bstok/Desktop/Park DA 6213 Data Drive Decision Making and Design/Case...
198 sym R (8706 sym/41 pcs) 3 img
Assignment 3
10. This question should be answered using the Weekly data set, which is part of the ISLR package. This data is similar in nature to the Smarket data from this chapter’s lab, except that it contains 1,089 weekly returns for 21 years, from the beginning of 1990 to the end of 2010. (a) Produce some numerical and graphical summaries of the Weekly ...
7712 sym R (80335 sym/107 pcs) 5 img
BBCC CS2
Splitting Data bbbc_test <- read_excel("BBBC-Test.xlsx") bbbc_test <- (subset(bbbc_test, select=-c(Observation))) bbbc_test$Gender[bbbc_test$Gender==1]="Male"; bbbc_test$Gender[bbbc_test$Gender==0]="Female"; bbbc_train <- read_excel("BBBC-Train.xlsx") bbbc_train <- (subset(bbbc_train, select=-c(Observation))) bbbc_train$Gender[bbbc_tr...
1143 sym R (45362 sym/102 pcs) 4 img
Assignment 4
3. We now review k-fold cross-validation. (a) Explain how k-fold cross-validation is implemented We have a set of data that is randomly split into k number of similarly sized groups. One of the groups is used as a validation set, and the other groups (k-1 folds) are combined to form the training set. The MSE is calculated on the validation set. T...
6824 sym R (12466 sym/50 pcs)
Assignment 5
Problem 2 For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer. (a) The lasso, relative to least squares, is: i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. ii. More flexible and hence will give improved prediction accuracy w...
6065 sym R (6615 sym/41 pcs) 2 img