Publications by Gitanjali Mule
Boston Housing Prices Analysis
Question 2 Explain whether each scenario is a classification or regression problem, and indicate whether we are most interested in inference or prediction. Finally, provide n and p. (a) We collect a set of data on the top 500 firms in the US. For each firm we record profit, number of employees, industry and the CEO salary. We are interested in ...
7749 sym R (64657 sym/65 pcs) 12 img
Data Analytics Application Final Project
data = read.table("D:/Fall 2021/DA Application/project/marketing_campaign.csv", sep = "\t", header = TRUE) df = data.frame(data) head(df) ## ID Year_Birth Education Marital_Status Income Kidhome Teenhome Dt_Customer ## 1 5524 1957 Graduation Single 58138 0 0 04-09-2012 ## 2 2174 1954 Graduation ...
77 sym R (11445 sym/57 pcs) 14 img
ISLR Exercise 6
Question 2 2. (a) Less flexible and will give improved prediction accuracy when its increase in bias is less than its decrease in variance. As lambda increases, flexibility of fit decreases, and so the estimated coefficients decrease with some being zero. This leads to a substantial decrease in the variance of the predictions for a small increas...
1347 sym R (12894 sym/45 pcs) 6 img
Exercise 5 ISLR
Question 3 (a) Description & Implementation Question: Explain how k-fold cross-validation is implemented. Answer: The data is segmented into k distinct, (usually) equal-sized ‘folds’. A model is trained on k−1 of the folds and tested on the remaining fold. This process is repeated k times, such that each of the k folds acts as the test dat...
3446 sym R (4344 sym/35 pcs)
ISLR Exercise 3.7
Question 1 2. Carefully explain the differences between the KNN classifier and KNN regression methods. Answer: KNN classifier is used for classification problem while KNN regression method is used for continuous variable/ regression problem KNN classifier classifies a point as the class which the majority of the knns has, while regression esti...
4029 sym R (10949 sym/46 pcs) 6 img
ISLA exercise 4
Question 10 head(Weekly) ## Year Lag1 Lag2 Lag3 Lag4 Lag5 Volume Today Direction ## 1 1990 0.816 1.572 -3.936 -0.229 -3.484 0.1549760 -0.270 Down ## 2 1990 -0.270 0.816 1.572 -3.936 -0.229 0.1485740 -2.576 Down ## 3 1990 -2.576 -0.270 0.816 1.572 -3.936 0.1598375 3.514 Up ## 4 1990 3.514 -2.576 -0.270 ...
982 sym R (16251 sym/108 pcs) 3 img
Assignment 7
Question 3 p=seq(0,1,0.01) gini= 2*p*(1-p) classerror= 1-pmax(p,1-p) crossentropy= -(p*log(p)+(1-p)*log(1-p)) plot(NA,NA,xlim=c(0,1),ylim=c(0,1),xlab='p',ylab='f') lines(p,gini,type='l') lines(p,classerror,col='blue') lines(p,crossentropy,col='red') legend(x='top',legend=c('gini','class error','cross entropy'), col=c('black','...
193 sym R (4389 sym/39 pcs) 13 img 10 tbl
Assignment 8
library(caret) ## Loading required package: ggplot2 ## Loading required package: lattice library(ISLR) library(tidyverse) ## -- Attaching packages --------------------------------------- tidyverse 1.3.1 -- ## v tibble 3.1.6 v dplyr 1.0.8 ## v tidyr 1.2.0 v stringr 1.4.0 ## v readr 2.1.2 v forcats 0.5.1 ## v purrr 0.3.4 ##...
385 sym R (8371 sym/97 pcs) 6 img 15 tbl
Assignment 6
custom_regression_metrics <- function (data, lev = NULL, model = NULL) { c(RMSE = sqrt(mean((data$obs-data$pred)^2)), Rsquared = summary(lm(pred ~ obs, data))$r.squared, MAE = mean(abs(data$obs-data$pred)), MSE = mean((data$obs-data$pred)^2), RSS = sum((data$obs-data$pred)^2)) } ctrl <- trainControl(method = "cv", number ...
945 sym R (8474 sym/33 pcs) 4 img