Publications by Zhi Ying Chen (Sec#1), Mengqin Cai (Sec#3), Fan Xu (Sec#4), Sin Ying Wong (Sec#4)
ProjectDraft_624
Part 1: Load Data and EDA Load Data df<-read_excel('StudentData.xlsx') df_eval<-read_excel('StudentEvaluation.xlsx') df<-data.frame(df) df_eval<-data.frame(df_eval) Data Sample head(df) ## Brand.Code Carb.Volume Fill.Ounces PC.Volume Carb.Pressure Carb.Temp PSC ## 1 B 5.340000 23.96667 0.2633333 68.2 141.2 0....
1778 sym R (12481 sym/40 pcs) 7 img 3 tbl
Data624_HW10
Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket - and therefore ‘Market Basket Analysis’. That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1...
2517 sym R (11946 sym/17 pcs) 4 img
Data624_HW9
8.1 Recreate the simulated data from Exercise 7.2: set.seed(200) simulated <- mlbench.friedman1(200, sd = 1) simulated <- cbind(simulated$x, simulated$y) simulated <- as.data.frame(simulated) colnames(simulated)[ncol(simulated)] <- "y" a. Fit a random forest model to all of the predictors, then estimate the variable importance scores: model1...
6993 sym R (10418 sym/62 pcs) 5 img
Data624_HW8
Question 7.2 Friedman (1991) introduced several benchmark data sets create by simulation.One of these simulations used the following nonlinear equation to create data: \(y=10sin(\pi x_1 x_2)+20(x_3-0.5)^2+10x_4+5X_5+N(0,\sigma ^2)\) where the x values are random variables uniformly distributed between 0, 1.The package mlbench contains a function ...
2925 sym R (23808 sym/57 pcs) 3 img
Data624_HW7
Question 6.2 Developing a model to predict permeability (see Sect. 1.4) could save significant resources for a pharmaceutical company, while at the same time more rapidly identifying molecules that have a sufficient permeability to become a drug: a. Start R and use these commands to load the data: data(permeability) The matrix fingerprints cont...
4090 sym R (16204 sym/52 pcs) 7 img
Data622_Final
Source Code: https://github.com/djlofland/DATA622_S2021_Group2/tree/master/FinalProject Load Data & EDA Information about the dataset Before loading our dataset, and discussing our data exploration process, I’ll quickly summarize the dataset that we’ll be using for our machine learning analysis. The dataset is part of UCI’s Cleveland Hear...
11248 sym R (26130 sym/82 pcs) 25 img 4 tbl
Data608FinalProject
We live in NYC and love this city which full of people from different counties. As a melting pot, NYC has 8.3 million population and it also has the longest commute time via car and public transit. Nobody want to talk about motor vehicle collision, but it does exist and make our daily long commute even longer when it happens. Today, I will dig in...
4070 sym R (10894 sym/43 pcs) 6 img
Housing Price Under the Influence of Covid-19 Pandemic
ABSTRACT Housing price is always a critical index for economic recovery. I investigated whether Covid 19 pandemic affected the real estate prices of residential dwellings across New York City and built a model to predict the sale price of the house given certain properties of the house. In this project, I explored, analyzed, and modeled a data se...
26653 sym R (16134 sym/14 pcs) 11 img 3 tbl
Data609_HW8
Ex1 Use the nnet package to analyze the iris data set. Use 80% of the 150 samples as the training data and the rest for validation. Discuss the results. data("iris") irisdf<-iris set.seed(759) index<-createDataPartition(irisdf$Species,p=0.8,list = FALSE) train<-irisdf[index,] test<-irisdf[-index,] nnetModel<-nnet(Species~.,data=train,size=2...
622 sym R (4722 sym/13 pcs) 1 img
Data 622 - Final Project
Source Code: https://github.com/djlofland/DATA622_S2021_Group2/tree/master/FinalProject Load Data & EDA Information about the dataset Before loading our dataset, and discussing our data exploration process, We’ll quickly summarize the dataset that we’ll be using for our machine learning analysis. The dataset is part of UCI’s Cleveland Hea...
23792 sym R (30270 sym/75 pcs) 18 img 5 tbl