Publications by Zhi Ying Chen (Sec#1), Mengqin Cai (Sec#3), Fan Xu (Sec#4), Sin Ying Wong (Sec#4)

DATA 607 Final Project

12.12.2019

Introduction New York City is one of the most famous places in the world. It draws millions of tourists every year which boosts our economy. NYC is therefore one of the hottest markets for Airbnb. Comparing to other nearby cities, New York City has the ease of commute by having a large subway coverage with varies bus lines and citibikes. self-gui...

9074 sym R (10226 sym/31 pcs) 4 img

Data 606 - Data Project

12.12.2019

Introduction Most friends around me have chosen either STEM or Business related majors instead of Liberal Arts. Also, Liberal Arts are always being considered as hard-to-get-a-job type of majors. The most popular dreams of kids are usually doctor, lawyer, or engineer. All these reasons raise my interest in investigating the relationship between m...

9693 sym R (10853 sym/50 pcs) 14 img

Data 607 Assignment: Tidyverse Vignette Part 1

01.12.2019

Assignment Description In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions. GitHub repository: https://github.com/acatlin/FALL2019TIDYVERSE FiveThirtyEight.com datasets. Kaggle datasets. You have two tasks...

7116 sym R (5527 sym/33 pcs) 1 img

Data607_Tidyverse_Vignette_Part_2

02.12.2019

Part 1 by Rosemond title: “Tidyverse Part 1” author: “C. Rosemond” date: “November 2, 2019” output: html_document Library library(tidyverse) Data Set(s) I selected two fivethirtyeight data sets: one that contains current Soccer Power Index (SPI) ratings and rankings for men’s club teams and a second that contains match-by-match SP...

3806 sym R (2151 sym/15 pcs) 1 img

Data 606 Lab 9 - Multiple linear regression

25.11.2019

Grading the professor Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related charac...

9502 sym R (18433 sym/54 pcs) 16 img 1 tbl

DATA 606 Assignment 9 - Multiple and Logistic Regression

25.11.2019

#Baby weights, Part I. (9.1, p. 350) The Child Health and Development Studies investigate a range of topics. One study considered all pregnancies between 1960 and 1967 among women in the Kaiser Foundation Health Plan in the San Francisco East Bay area. Here, we study the relationship between smoking and weight of the baby. The variable smoke is ...

6732 sym R (3382 sym/22 pcs) 2 img

DATA_612_Project_2_Content-Based and Collaborative Filtering

24.06.2020

Code Show All Hide All DATA 612 Project 2 - Content-Based and Collaborative Filtering Instruction Introduction Load Packages Read Data Data Exploration Data Sampling Building the Recommendation Models User-Based Collaborative Filtering Models Item-Based Collaborative Filtering Models Summary Sin Ying Wong, Zhi Ying Chen, Fan Xu 6/13/2020 ...

1175 sym R (350 sym/2 pcs)

DATA_612_Project_3_Matrix Factorization Methods

24.06.2020

Code Show All Hide All DATA 612 Project 3 - Matrix Factorization Methods Instruction Introduction Load Packages Read Data Data Exploration Handle the Missing Values Singular Value Decomposition (SVD) Dimensionality Reduction Find k Reduce Matrices’ Dimensionality Best Low Rank Approximation SVD Accuracy Evaluation Build Recommendation Mode...

9775 sym R (14548 sym/29 pcs) 8 img 4 tbl

DATA_612_Project_1_Global Baseline Predictors and RMSE

24.06.2020

Code Show All Hide All DATA 612 Project 1 - Global Baseline Predictors and RMSE Instruction Introduction Load Packages Read Data Data Exploration Separate Training Dataset & Test Dataset Create a User-Item Matrix Calculate Raw Average Rating Calculate RMSE for Raw Average Calculate the bias for each user and each item Calculate the Baseline P...

3905 sym R (8524 sym/22 pcs) 1 img 5 tbl

DATA_612_Final Project

17.07.2020

Code Show All Hide All DATA 612 Final Project 1 Project Goal 2 Introduction 2.1 Note 3 Load Library 4 Build Model in RecommenderLab 4.1 Import Small MovieLens Dataset 4.1.1 Read data from CSV 4.2 Data Exploration 4.2.1 Data: Ratings 4.2.2 Data: Movie 4.2.3 User Similarity 4.2.4 Item Similarity 4.3 Select Sample Data 4.4 Build Models 4.4.1 ...

9243 sym R (12378 sym/59 pcs) 16 img 1 tbl