Case Study 1
R Markdown Data1: Facebook Metrics - case study1 data fbook=read.csv(“dataset_Facebook.csv”, header=TRUE) library(mice) ## Warning: package 'mice' was built under R version 4.0.3 ## ## Attaching package: 'mice' ## The following object is masked from 'package:stats': ## ## filter ## The following objects are masked from 'package:base...
Q2 Carefully explain the differences between the KNN classifier and the KNN regression methods. The KNN regression method is very similar to the KNN classifier. For a particular x0 and K, the KNN regression method will identify training observations with similar x values and will give an average continuous response value (y-hat). On the other ha...
BM Case Study
Loading in bank-additional.csv, getting rid of missing values, removing duration column, assigning categorical variables as factors, recoding pdays, and assigning a train test split, removed a row that contained illiterate value in education (3927), removed a row that contained yes in default column (3515). knitr::opts_chunk$set(echo = TRUE) lib...
Part One: Predict acquisition Train Test Split set.seed(100) data(acquisitionRetention) data1 = acquisitionRetention index = sample(nrow(data1),0.8*nrow(data1)) train = data1[index,] test = data1[-index,] Cleaning Data anyNA(data1) model <- glm(acquisition ~acq_exp + industry + revenue + employees, data = train, family = binomial) vif...
DJ Case Study
Load Data, Set Up Lag Variables, Train Test Split, Lag Plots #load data dow=read.table("", sep = ',', header=TRUE) dow=na.omit(dow) dow$date <- as.Date(dow$date, format = "%m/%d/%y") dow$open = as.numeric(gsub("[\\$,]", "", dow$open)) dow$high = as.numeric(gsub("[\\$,]", "", dow$high)) dow$low = as.numeric(gsub("[\\$,]",...
Case Study 2
library(beepr) library(pROC) ## Type 'citation("pROC")' for a citation. ## ## Attaching package: 'pROC' ## The following objects are masked from 'package:stats': ## ## cov, smooth, var library(InformationValue) library( library(knitr) setwd("C:/Users/bstok/Desktop/Park DA 6213 Data Drive Decision Making and Design/Case...
Assignment 3
10. This question should be answered using the Weekly data set, which is part of the ISLR package. This data is similar in nature to the Smarket data from this chapter’s lab, except that it contains 1,089 weekly returns for 21 years, from the beginning of 1990 to the end of 2010. (a) Produce some numerical and graphical summaries of the Weekly ...
Splitting Data bbbc_test <- read_excel("BBBC-Test.xlsx") bbbc_test <- (subset(bbbc_test, select=-c(Observation))) bbbc_test$Gender[bbbc_test$Gender==1]="Male"; bbbc_test$Gender[bbbc_test$Gender==0]="Female"; bbbc_train <- read_excel("BBBC-Train.xlsx") bbbc_train <- (subset(bbbc_train, select=-c(Observation))) bbbc_train$Gender[bbbc_tr...
Assignment 4
3. We now review k-fold cross-validation. (a) Explain how k-fold cross-validation is implemented We have a set of data that is randomly split into k number of similarly sized groups. One of the groups is used as a validation set, and the other groups (k-1 folds) are combined to form the training set. The MSE is calculated on the validation set. T...
Assignment 5
Problem 2 For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer. (a) The lasso, relative to least squares, is: i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. ii. More flexible and hence will give improved prediction accuracy w...
