Publications by Abha Jha - wnn231
Customer Retention Case Study
library(SMCRM) # CRM data library(dplyr) # data wrangling library(tidyr) # data wrangling library(ggplot2) # plotting library(survival) # survival library(rpart) # DT library(randomForestSRC) # RF library(tidyverse) library(tree) library(Metrics) library(caret) library(car) library(kernlab) library(MASS) library(performance) librar...
15372 sym 12 img
Algorithms Homework 2
library(DescTools) library(MASS) library(car) Exercise 1: Analysis of Variance The heartbpchol.csv data set contains continuous cholesterol (Cholesterol) and blood pressure status (BP_Status) (category: High/ Normal/ Optimal) for alive patients. For the heartbpchol.xlsx data set, consider a one-way ANOVA model to identify differences between g...
16036 sym R (10366 sym/66 pcs) 5 img
Algorithms Homework 1
Homework 1 Author: Abha Jha abc123: wnn231 install.packages(“dplyr”) output: html_document: default library(e1071) library(fBasics) library(tidyverse) library(devtools) library(dplyr) Exercise 1 a) MPGCombo = (c(CARS\(MPG_City * 0.4))+(c(CARS\)MPG_Highway * 0.6)) CARS = data.frame(CARS, MPGCombo) boxplot(MPGCombo, main = “MPG Combined (...
5849 sym
Algorithms Homework 3
Exercise 1 ## 'data.frame': 3134 obs. of 4 variables: ## $ Weight : int 132 158 156 131 136 194 179 151 174 155 ... ## $ Diastolic : int 90 80 76 92 80 68 76 68 90 90 ... ## $ Systolic : int 170 128 110 176 112 132 128 108 142 130 ... ## $ Cholesterol: int 250 242 281 196 196 211 225 221 188 292 ... Fitting the linear model...
7588 sym 8 img
Algorithms Homework 4
Exercise 1: The liver data set is a subset of the ILPD (Indian Liver Patient Dataset) data set. It contains the first 10 variables described on the UCI Machine Learning Repository and a LiverPatient variable (indicating whether or not the individual is a liver patient. People with active liver disease are coded as LiverPatient=1 and people withou...
14771 sym 8 img
Algorithms Midterm Exam
Data Sets: You need to download dataset birthweight.csv for Exercise 1-4. The birthweight data record live, singleton births to mothers between the ages of 18 and 45 in the United States who were classified as black or white. There are total of 295 observations in birthweight, and variables are: Weight: Infant birth weight (gram) Black: Categori...
9421 sym 4 img
Algorithms Final Exam
Data Sets: You need to download dataset birthweight_final.csv. The data record live, singleton births to mothers between the ages of 18 and 45 in the United States who were classified as black or white. There are total of 400 observations in birthweight, and variables are: Weight: Infant birth weight (gram) Weight_Gr; Categorical variable for in...
8603 sym 1 img
Dow Jones Case Study
2. Data dowjones = read.table("dow_jones_index.data", sep = ",", header = TRUE) str(dowjones) ## 'data.frame': 750 obs. of 16 variables: ## $ quarter : int 1 1 1 1 1 1 1 1 1 1 ... ## $ stock : chr "AA" "AA" "AA" "AA" ... ## $ date : chr "1/7/2011" "1/1...
2509 sym 5 img
STA 6543 - Predictive Modeling Assignment 5
Problem 2 For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer. (a) The lasso, relative to least squares, is: More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance. More flexible and hence will give improved prediction accuracy when it...
7155 sym R (8396 sym/58 pcs) 7 img
STA 6543 - Predictive Modeling Assignment 4
Problem 3 We now review k-fold cross-validation. (a) Explain how k-fold cross-validation is implemented. This approach involves randomly dividing the set of observations into k groups or folds of approximately equal size. The first fold is treated as a validation set and the method is fit on the remaining (k-1) folds. The mean squared error is co...
8413 sym R (10032 sym/48 pcs)