Publications by Leo Yi & Christopher Bloome
Data 605 Discussion 13
Topic Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitaive interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not? Source We’ll be looking at the air...
959 sym R (2917 sym/18 pcs) 3 img
Data 605 Discussion 12
High School Math Proficiency in NYC I was looking for data to use for this week’s discussion and stumbled on this site that has information about New York City residents. We’ll be looking to see if median household income and the unemployment rate can predict the levels of high school math proficiency. Found here Loading Packages library(tid...
1032 sym R (6102 sym/22 pcs) 3 img
Data 605 Discussion 11
Oranges This built in dataset follows 5 different orange trees and records the circumference in millimeters for ages or days since 12/31/1968. Here we’ll see if we can use this data to create a linear model to predict the circumference based on the age. data(Orange) head(Orange) ## Tree age circumference ## 1 1 118 30 ## 2 ...
263 sym R (1306 sym/6 pcs) 2 img
Data624 HW6
Recommender Systems Market Basket Analysis # import data url <- 'https://raw.githubusercontent.com/dataconsumer101/data624/main/GroceryDataSet.csv' df <- read.csv(url, header = F, na.strings=c("")) # convert column names to lowercase names(df) <- lapply(names(df), tolower) # add row index as new field df$row <- row.names(df) %>% as.numeric()...
1992 sym R (17766 sym/31 pcs) 10 img 1 tbl
Data624 HW 4
Linear Regression and its Cousins KJ 6.3 A chemical manufacturing process for a pharmaceutical product was discussed in Sect. 1.4. In this problem, the objective is to understand the relationship between biological measurements of the raw materials (predictors), measurements of the manufacturing process (predictors), and the response of product ...
2228 sym R (4246 sym/14 pcs) 4 img
Data624 HW3
Data Pre-Processing and Exponential Smoothing HA 8.1 Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers. a Explain the differences among these figures. DO they all indicate that the data are white noise? Each series ACF plots show autocorrelation between different periods in black, with blue dotted li...
3177 sym R (2607 sym/33 pcs) 27 img
Data624 Project 1
Group Members Subhalaxmi Rout Kenan Sooklall Devin Teran Christian Thieme Leo Yi Getting The Data url <- 'https://raw.githubusercontent.com/christianthieme/Predictive-Analytics/main/data624_project1_dataset.csv' df <- read.csv(url) glimpse(df) ## Rows: 10,572 ## Columns: 7 ## $ ï..SeriesInd <int> 40669, 40669, 40669, 40669, 40669, 40669, 4...
670 sym R (4367 sym/19 pcs) 7 img
Data624 Project 1
Group Members Subhalaxmi Rout Kenan Sooklall Devin Teran Christian Thieme Leo Yi Getting The Data url <- 'https://raw.githubusercontent.com/christianthieme/Predictive-Analytics/main/Project1-TimeSeries/data624_project1_dataset.csv' df <- read.csv(url) glimpse(df) ## Rows: 10,572 ## Columns: 7 ## $ ï..SeriesInd <int> 40669, 40669, 40669, 40...
1247 sym R (7770 sym/34 pcs) 8 img
Data624 Group 4 Project 1 Report
Group Members Subhalaxmi Rout Kenan Sooklall Devin Teran Christian Thieme Leo Yi Get The Data The dataset for this project was provided to us in excel format which made it relatively easy to ingest in R. Some of us pointed to the dataset from our local machines and others uploaded a converted csv to github, importing the data from the raw link ...
6085 sym R (2516 sym/8 pcs) 9 img
Data624 HW5
Nonlinear Regression Models, Regression Trees and Rules-Based Models KJ 7.2 Friedman (1991) introduced several benchmark data sets created by simulation. One of these simulations use dthe following nonlinear equestion to create data: \[ y = 10sin(\pi x_1x_2) + 20(x_x-0.5)^2 + 10x_4 + 5x_5 +N(0,\sigma^2) \] where the x values are random variables...
7574 sym R (22841 sym/62 pcs) 6 img