Publications by Zhi Ying Chen (Sec#1), Mengqin Cai (Sec#3), Fan Xu (Sec#4), Sin Ying Wong (Sec#4)

DATA 607 - Final Project

12.12.2019

Introduction New York City is one of the most famous places in the world. It draws millions of tourists every year which boosts our economy. NYC is therefore one of the hottest markets for Airbnb. Comparing to other nearby cities, New York City has the ease of commute by having a large subway coverage with varies bus lines and citibikes. self-gui...

9074 sym R (10226 sym/31 pcs) 4 img

Data 606 - Data Final Project

11.12.2019

Introduction With the rapid development of science and technology STEM majors are becoming more popular than other majors. Under the growing needs in science related jobs, I would like to study the relationship between median income and different major categories in STEM, business, and liberal arts. Also, I would like to study the relationship be...

9114 sym R (15644 sym/40 pcs) 17 img

Data 607 - TidyVerse - Part 1

04.12.2019

library(tidyverse) Dataset The dataset I used is Border Crossing Entry Data from https://www.kaggle.com/datasets. To reduce size of the data, I select data from year 2002 to 2019. readr–read_csv read_csv from readr (a sub-package of tidyverse) is a faster function to import csv files in terms of performance than the R default function read.cs...

2970 sym R (2066 sym/14 pcs) 2 img

Data 607 - TidyVerse - Part 2

04.12.2019

TidyVerse Part 1 by Sie Siong Wong Objective Dplyr annd GGPlot is the TidyVerse packages that I choose to create a programming sample to demonstrate how to use its capabilities from reshape the data to plot analysis result.The dataset is Ramen Rating and got this from Kaggle. What is dplyr? It is the next iteration of plyr package. It is fast...

1630 sym R (5469 sym/19 pcs) 5 img

Data 606 - Lab9 - Multiple Linear Regression

25.11.2019

Grading the professor Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related charac...

15547 sym R (15645 sym/47 pcs) 22 img 1 tbl

Data 606 - Hw9 - Multiple and Logistic Regression

25.11.2019

Baby weights, Part I. (9.1, p. 350) The Child Health and Development Studies investigate a range of topics. One study considered all pregnancies between 1960 and 1967 among women in the Kaiser Foundation Health Plan in the San Francisco East Bay area. Here, we study the relationship between smoking and weight of the baby. The variable smoke is c...

8710 sym R (839 sym/4 pcs) 2 img

Data 612 - Project 3 - Matrix Factorization Methods

23.06.2020

Code Show All Hide All DATA 612 Project 3 - Matrix Factorization Methods Instruction Introduction Load Packages Read Data Data Exploration Handle the Missing Values Singular Value Decomposition (SVD) Dimensionality Reduction Find k Reduce Matrices’ Dimensionality Best Low Rank Approximation SVD Accuracy Evaluation Build Recommendation Mode...

9775 sym R (14536 sym/29 pcs) 8 img 4 tbl

Data 612 - Project 1 - Global Baseline Predictors and RMSE

18.06.2020

Code Show All Hide All DATA 612 Project 1 - Global Baseline Predictors and RMSE Instruction Introduction Load Packages Read Data Data Exploration Separate Training Dataset & Test Dataset Create a User-Item Matrix Calculate Raw Average Rating Calculate RMSE for Raw Average Calculate the bias for each user and each item Calculate the Baseline P...

3905 sym R (8524 sym/22 pcs) 1 img 5 tbl

Data 612 - Project 2 - Content-Based and Collaborative Filtering

18.06.2020

Code Show All Hide All DATA 612 Project 2 - Content-Based and Collaborative Filtering Instruction Introduction Load Packages Read Data Data Exploration Data Sampling Building the Recommendation Models User-Based Collaborative Filtering Models Item-Based Collaborative Filtering Models Summary Sin Ying Wong, Zhi Ying Chen, Fan Xu 6/13/2020 ...

1175 sym R (338 sym/2 pcs)

Data 605 - Assignment 1

31.08.2020

Code Show All Hide All Data 605 HW1: Vectors, Matrices, Systems of Equation 1 Problem Set 1 1.1 Part(1) 1.2 Part(2) 1.3 Part(3) 1.4 Part(4) 2 Problem Set 2 2.1 Answer Sin Ying Wong 08/30/2020 Please refer to the Assignment 1 Document. library(tidyr) library(dplyr) library(pracma) 1 Problem Set 1 You can think of vectors representing m...

3769 sym R (4531 sym/15 pcs)