Publications by Devin Teran, Gabe Abreu, Amit Kapoor, Subhalaxmi Rout

Data 605 Assignment 15

10.12.2020

Data 605 HW #15 Find the equation of the regression line for the given points. Round any final values to the nearest hundredth, if necessary. ( 5.6, 8.8 ), ( 6.3, 12.4 ), ( 7, 14.8 ), ( 7.7, 18.2 ), ( 8.4, 20.8 ) pt1 <- c(5.6, 6.3, 7, 7.7, 8.4) pt2 <- c(8.8, 12.4, 14.8, 18.2, 20.8) reg <- lm(pt2 ~ pt1) summary(reg) ## ## Call: ## lm(form...

1389 sym R (1074 sym/10 pcs) 3 img

Data 608 HW1

15.02.2021

Principles of Data Visualization and Introduction to ggplot2 I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in: library(psych) library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## Th...

2072 sym R (9352 sym/31 pcs) 5 img

Data 621 Blog 1

22.05.2021

library(MASS) Stepwise Regression One of the first concepts learned as a data scientist is regression, specifically Simple Linear Regression. As a data scientist continues on their and is exposed to a variety of data sets, they will quickly realize that Simple Linear Regression will either not suffice or be the optimal approach to solve a proble...

2319 sym R (10885 sym/20 pcs)

Data 621 Blog 2

22.05.2021

Data Exploration One of the initial steps in solving any data science problem is the process of data exploration. Data exploration entails finding missing variables, outliers, data distribution, and some visualizations to to get an overall big picture. We are going to use kaggle’s House Price competition data to perform data exploration using t...

1994 sym R (10420 sym/14 pcs) 10 img

Data 621 HW#3

22.05.2021

INTRODUCTION The aim of this assignment is to build a binary logistic regression model to predict whether a neighborhood will be at risk for high crime levels, using a data set containing information on crime for various neighborhoods of a major city. Each record has a response variable indicating whether or not the crime rate is above the median...

10283 sym R (8175 sym/27 pcs) 9 img 9 tbl

Data 621 HW#1

22.05.2021

Abstract To see how regression will help us evaluate baseball team performance, this project is designed to explore whether a teams success in any given season can be predicted or explained by any number of statistics in that season. Our goal is to build a multiple linear regression model on the training data to predict the number of wins for the...

13361 sym R (13529 sym/14 pcs) 11 img 6 tbl

Data 621 HW#2

22.05.2021

Objective Classification is the process of predicting a categorical label of a data object based on its features and properties. In this assignment we created R functions to calculate several different classification metrics as R functions. We also verified the functions by checking R package implementations against our output. Lastly, we graphed...

4854 sym R (4432 sym/33 pcs) 2 img 3 tbl

Data 621 HW#4

22.05.2021

Overview This assignment is about to explore, to analyze and to model a data set containing approximately 8000 records representing a customer at an auto insurance company. Each record has two response variables. The first response variable, TARGET_FLAG, is a 1 or a 0. A “1” means that the person was in a car crash. A zero means that the pers...

3138 sym R (113819 sym/21 pcs) 9 img

Data 621 Blog 3

23.05.2021

Data Imputation The last post discussed data exploration through data visualizations. In the exploratory phase, you may come across columns with a mixed degree of missing rows. Columns with high percentages of missing rows should be removed as you would be substituting too much of the data. Columns with small percentages could be either ignored o...

1539 sym R (3544 sym/17 pcs) 2 img

Data 624 HW 9

22.11.2021

Data 624 HW9 11/21/2021 Gabe Abreu 8.1 8.2 8.3 8.7 8.1 Recreate the simulated data from Exercise 7.2: set.seed(200) simulated <- mlbench.friedman1(200, sd = 1) simulated <- cbind(simulated$x, simulated$y) simulated <- as.data.frame(simulated) colnames(simulated)[ncol(simulated)] <- "y" Fit a random forest model to all of the predictors, th...

6181 sym R (6298 sym/33 pcs) 3 img 1 tbl