Publications by Amit Kapoor

Data 621 - HW3

18.04.2021

Introduction In this homework assignment, you will explore, analyze and model a data set containing information on crime for various neighborhoods of a major city. Each record has a response variable indicating whether or not the crime rate is above the median crime rate (1) or not (0). Your objective is to build a binary logistic regression mode...

9004 sym R (22333 sym/79 pcs) 23 img 1 tbl

Data624 - Homework8

25.04.2021

library(AppliedPredictiveModeling) library(tidyverse) library(caret) library(mlbench) library(naniar) Exercise 7.2 Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data: \[y = 10 sin(\pi x_1 x_2) + 20(x_3 − 0.5)^2 + 10x_4 + 5x_5 + N(0, \sigma^2...

7739 sym R (37664 sym/103 pcs) 19 img

Data 621 - HW4

02.05.2021

Overview In this homework assignment, you will explore, analyze and model a data set containing approximately 8000 records representing a customer at an auto insurance company. Each record has two response variables. The first response variable, TARGET_FLAG, is a 1 or a 0. A “1” means that the person was in a car crash. A zero means that the ...

7415 sym R (169449 sym/58 pcs) 20 img 2 tbl

Data621 - Blog2

11.05.2021

Non Linear Regression Non-linear regression is a method to model a non-linear relationship between the dependent variable and independent variable(s). It is a regression technique in which the dependent variables are modeled as a non-linear function of one or more independent variables. Simple linear regression shows the relationship between two ...

3867 sym R (4396 sym/8 pcs)

Data621 - Blog3

12.05.2021

Regression Trees Tree-based models consist of one or more nested conditional statements for the predictors that partition the data. Within these partitions, a model is used to predict the outcome. In the tree models terminology, there are two splits of the data into three terminal nodes or leaves of the tree. To get a prediction for new data, we ...

4242 sym R (5136 sym/10 pcs)

Data621 - Blog4

14.05.2021

Inference vs Prediction The terms Inference and Prediction are used extensively in data science community. Inference uses the model to understand and learn about the data generation process while Prediction uses the model to predict the outcomes for new sample. Inference is the information learned about the data generating process. On the other h...

2760 sym 2 img

Data 621 - HW5

23.05.2021

Overview In this assignment, we will explore, analyze and model a data set containing information on approximately 12,000 commercially available wines. The variables are mostly related to the chemical properties of the wine being sold. The response variable is the number of sample cases of wine that were purchased by wine distribution companies a...

7618 sym R (22326 sym/21 pcs) 6 img 2 tbl

Data 609 - Module8

11.12.2021

library(nnet) library(caret) ## Loading required package: lattice ## Loading required package: ggplot2 Ex.1 Use the nnet package to analyze the iris data set. Use 80% of the 150 samples as the training data and the rest for validation. Discuss the results. Solution # iris dataset data(iris) set.seed(111) partition <- sample(nrow(iris), nrow(ir...

2180 sym R (6227 sym/32 pcs) 2 img

Data622 - FinalProject

10.12.2021

Overview As income inequality grows throughout the world, understanding the relationships between an individuals income and the other factors in this study we can better identify and address the underlying causes for the inequalities. This study will analyze how 15 factors such as age, county, working class, sex, race, education and more influenc...

27153 sym R (64284 sym/23 pcs) 29 img 2 tbl

Data 609 - Module7

03.12.2021

# Libraries library(e1071) library(caret) ## Loading required package: lattice ## Loading required package: ggplot2 Ex.1 Use the svm() algorithm of the e1071 package to carry out the support vector machine for the PlantGrowth data set. Then, discuss the number of support vectors/samples. [Install the e1071 package in R if needed.] Solution # Pla...

1040 sym R (3510 sym/15 pcs)