Publications by Zhi Ying Chen (Sec#1), Mengqin Cai (Sec#3), Fan Xu (Sec#4), Sin Ying Wong (Sec#4)

ProjectDraft_624

18.05.2021

Part 1: Load Data and EDA Load Data df<-read_excel('StudentData.xlsx') df_eval<-read_excel('StudentEvaluation.xlsx') df<-data.frame(df) df_eval<-data.frame(df_eval) Data Sample head(df) ## Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC ## 1 B 5.340000 23.96667 0.2633333 68.2 141.2 0....

1778 sym R (12481 sym/40 pcs) 7 img 3 tbl

Data624_HW10

09.05.2021

Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket - and therefore ‘Market Basket Analysis’. That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1...

2517 sym R (11946 sym/17 pcs) 4 img

Data624_HW9

02.05.2021

8.1 Recreate the simulated data from Exercise 7.2: set.seed(200) simulated <- mlbench.friedman1(200, sd = 1) simulated <- cbind(simulated$x, simulated$y) simulated <- as.data.frame(simulated) colnames(simulated)[ncol(simulated)] <- "y" a. Fit a random forest model to all of the predictors, then estimate the variable importance scores: model1...

6993 sym R (10418 sym/62 pcs) 5 img

Data624_HW8

26.04.2021

Question 7.2 Friedman (1991) introduced several benchmark data sets create by simulation.One of these simulations used the following nonlinear equation to create data: \(y=10sin(\pi x_1 x_2)+20(x_3-0.5)^2+10x_4+5X_5+N(0,\sigma ^2)\) where the x values are random variables uniformly distributed between 0, 1.The package mlbench contains a function ...

2925 sym R (23808 sym/57 pcs) 3 img

Data624_HW7

19.04.2021

Question 6.2 Developing a model to predict permeability (see Sect. 1.4) could save significant resources for a pharmaceutical company, while at the same time more rapidly identifying molecules that have a sufficient permeability to become a drug: a. Start R and use these commands to load the data: data(permeability) The matrix fingerprints cont...

4090 sym R (16204 sym/52 pcs) 7 img

Data622_Final

11.07.2021

Source Code: https://github.com/djlofland/DATA622_S2021_Group2/tree/master/FinalProject Load Data & EDA Information about the dataset Before loading our dataset, and discussing our data exploration process, I’ll quickly summarize the dataset that we’ll be using for our machine learning analysis. The dataset is part of UCI’s Cleveland Hear...

11248 sym R (26130 sym/82 pcs) 25 img 4 tbl

Data608FinalProject

11.07.2021

We live in NYC and love this city which full of people from different counties. As a melting pot, NYC has 8.3 million population and it also has the longest commute time via car and public transit. Nobody want to talk about motor vehicle collision, but it does exist and make our daily long commute even longer when it happens. Today, I will dig in...

4070 sym R (10894 sym/43 pcs) 6 img

Housing Price Under the Influence of Covid-19 Pandemic

17.12.2021

ABSTRACT Housing price is always a critical index for economic recovery. I investigated whether Covid 19 pandemic affected the real estate prices of residential dwellings across New York City and built a model to predict the sale price of the house given certain properties of the house. In this project, I explored, analyzed, and modeled a data se...

26653 sym R (16134 sym/14 pcs) 11 img 3 tbl

Data609_HW8

13.12.2021

Ex1 Use the nnet package to analyze the iris data set. Use 80% of the 150 samples as the training data and the rest for validation. Discuss the results. data("iris") irisdf<-iris set.seed(759) index<-createDataPartition(irisdf$Species,p=0.8,list = FALSE) train<-irisdf[index,] test<-irisdf[-index,] nnetModel<-nnet(Species~.,data=train,size=2...

622 sym R (4722 sym/13 pcs) 1 img

Data 622 - Final Project

26.09.2021

Source Code: https://github.com/djlofland/DATA622_S2021_Group2/tree/master/FinalProject Load Data & EDA Information about the dataset Before loading our dataset, and discussing our data exploration process, We’ll quickly summarize the dataset that we’ll be using for our machine learning analysis. The dataset is part of UCI’s Cleveland Hea...

23792 sym R (30270 sym/75 pcs) 18 img 5 tbl