Publications by Amit Kapoor
Data 621 - HW3
Introduction In this homework assignment, you will explore, analyze and model a data set containing information on crime for various neighborhoods of a major city. Each record has a response variable indicating whether or not the crime rate is above the median crime rate (1) or not (0). Your objective is to build a binary logistic regression mode...
9004 sym R (22333 sym/79 pcs) 23 img 1 tbl
Data624 - Homework8
library(AppliedPredictiveModeling) library(tidyverse) library(caret) library(mlbench) library(naniar) Exercise 7.2 Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data: \[y = 10 sin(\pi x_1 x_2) + 20(x_3 − 0.5)^2 + 10x_4 + 5x_5 + N(0, \sigma^2...
7739 sym R (37664 sym/103 pcs) 19 img
Data 621 - HW4
Overview In this homework assignment, you will explore, analyze and model a data set containing approximately 8000 records representing a customer at an auto insurance company. Each record has two response variables. The first response variable, TARGET_FLAG, is a 1 or a 0. A “1” means that the person was in a car crash. A zero means that the ...
7415 sym R (169449 sym/58 pcs) 20 img 2 tbl
Data621 - Blog2
Non Linear Regression Non-linear regression is a method to model a non-linear relationship between the dependent variable and independent variable(s). It is a regression technique in which the dependent variables are modeled as a non-linear function of one or more independent variables. Simple linear regression shows the relationship between two ...
3867 sym R (4396 sym/8 pcs)
Data621 - Blog3
Regression Trees Tree-based models consist of one or more nested conditional statements for the predictors that partition the data. Within these partitions, a model is used to predict the outcome. In the tree models terminology, there are two splits of the data into three terminal nodes or leaves of the tree. To get a prediction for new data, we ...
4242 sym R (5136 sym/10 pcs)
Data621 - Blog4
Inference vs Prediction The terms Inference and Prediction are used extensively in data science community. Inference uses the model to understand and learn about the data generation process while Prediction uses the model to predict the outcomes for new sample. Inference is the information learned about the data generating process. On the other h...
2760 sym 2 img
Data 621 - HW5
Overview In this assignment, we will explore, analyze and model a data set containing information on approximately 12,000 commercially available wines. The variables are mostly related to the chemical properties of the wine being sold. The response variable is the number of sample cases of wine that were purchased by wine distribution companies a...
7618 sym R (22326 sym/21 pcs) 6 img 2 tbl
Data 609 - Module8
library(nnet) library(caret) ## Loading required package: lattice ## Loading required package: ggplot2 Ex.1 Use the nnet package to analyze the iris data set. Use 80% of the 150 samples as the training data and the rest for validation. Discuss the results. Solution # iris dataset data(iris) set.seed(111) partition <- sample(nrow(iris), nrow(ir...
2180 sym R (6227 sym/32 pcs) 2 img
Data622 - FinalProject
Overview As income inequality grows throughout the world, understanding the relationships between an individuals income and the other factors in this study we can better identify and address the underlying causes for the inequalities. This study will analyze how 15 factors such as age, county, working class, sex, race, education and more influenc...
27153 sym R (64284 sym/23 pcs) 29 img 2 tbl
Data 609 - Module7
# Libraries library(e1071) library(caret) ## Loading required package: lattice ## Loading required package: ggplot2 Ex.1 Use the svm() algorithm of the e1071 package to carry out the support vector machine for the PlantGrowth data set. Then, discuss the number of support vectors/samples. [Install the e1071 package in R if needed.] Solution # Pla...
1040 sym R (3510 sym/15 pcs)