Publications by Gitanjali Mule

Boston Housing Prices Analysis


Question 2 Explain whether each scenario is a classification or regression problem, and indicate whether we are most interested in inference or prediction. Finally, provide n and p. (a) We collect a set of data on the top 500 firms in the US. For each firm we record profit, number of employees, industry and the CEO salary. We are interested in ...

Data Analytics Application Final Project


data = read.table("D:/Fall 2021/DA Application/project/marketing_campaign.csv", sep = "\t", header = TRUE) df = data.frame(data) head(df) ## ID Year_Birth Education Marital_Status Income Kidhome Teenhome Dt_Customer ## 1 5524 1957 Graduation Single 58138 0 0 04-09-2012 ## 2 2174 1954 Graduation ...

ISLR Exercise 6


Question 2 2. (a) Less flexible and will give improved prediction accuracy when its increase in bias is less than its decrease in variance. As lambda increases, flexibility of fit decreases, and so the estimated coefficients decrease with some being zero. This leads to a substantial decrease in the variance of the predictions for a small increas...

Exercise 5 ISLR


Question 3 (a) Description & Implementation Question: Explain how k-fold cross-validation is implemented. Answer: The data is segmented into k distinct, (usually) equal-sized ‘folds’. A model is trained on k−1 of the folds and tested on the remaining fold. This process is repeated k times, such that each of the k folds acts as the test dat...

ISLR Exercise 3.7


Question 1 2. Carefully explain the differences between the KNN classifier and KNN regression methods. Answer: KNN classifier is used for classification problem while KNN regression method is used for continuous variable/ regression problem KNN classifier classifies a point as the class which the majority of the knns has, while regression esti...

ISLA exercise 4


Question 10 head(Weekly) ## Year Lag1 Lag2 Lag3 Lag4 Lag5 Volume Today Direction ## 1 1990 0.816 1.572 -3.936 -0.229 -3.484 0.1549760 -0.270 Down ## 2 1990 -0.270 0.816 1.572 -3.936 -0.229 0.1485740 -2.576 Down ## 3 1990 -2.576 -0.270 0.816 1.572 -3.936 0.1598375 3.514 Up ## 4 1990 3.514 -2.576 -0.270 ...

Assignment 7


Question 3 p=seq(0,1,0.01) gini= 2*p*(1-p) classerror= 1-pmax(p,1-p) crossentropy= -(p*log(p)+(1-p)*log(1-p)) plot(NA,NA,xlim=c(0,1),ylim=c(0,1),xlab='p',ylab='f') lines(p,gini,type='l') lines(p,classerror,col='blue') lines(p,crossentropy,col='red') legend(x='top',legend=c('gini','class error','cross entropy'), col=c('black','...

Assignment 8


library(caret) ## Loading required package: ggplot2 ## Loading required package: lattice library(ISLR) library(tidyverse) ## -- Attaching packages --------------------------------------- tidyverse 1.3.1 -- ## v tibble 3.1.6 v dplyr 1.0.8 ## v tidyr 1.2.0 v stringr 1.4.0 ## v readr 2.1.2 v forcats 0.5.1 ## v purrr 0.3.4 ##...

Assignment 6


custom_regression_metrics <- function (data, lev = NULL, model = NULL) { c(RMSE = sqrt(mean((data$obs-data$pred)^2)), Rsquared = summary(lm(pred ~ obs, data))$r.squared, MAE = mean(abs(data$obs-data$pred)), MSE = mean((data$obs-data$pred)^2), RSS = sum((data$obs-data$pred)^2)) } ctrl <- trainControl(method = "cv", number ...

