Publications by Jie Zou

605 final_part2

16.12.2021

605: final part_2 Jie Zou 2021-12-15 library(ggplot2) library(tidyverse) library(dplyr) library(tidyr) library(plotly) https://www.kaggle.com/c/house-prices-advanced-regression-techniques take a glance of data, I see that there are NAs in some numerical and categorical variable. Combine with the description of data, NA is meaningful in categoric...

2833 sym R (77351 sym/75 pcs) 13 img

605: final

16.12.2021

605:final Jie Zou 2021-12-16 library(MASS) Playing with PageRank From matrix A According to the graph we used previously in the course note. # matrix A is A1 = matrix(c(0,1/2,1/2,0,0,0, 0,0,0,0,0,0, 1/3,1/3,0,0,1/3,0, 0,0,0,0,1/2,1/2, 0,0,0,1/2,0,1/2, 0,0,0,1,0,0), byrow = T, ...

2112 sym R (225595 sym/62 pcs) 23 img

608: hw1

14.02.2022

Principles of Data Visualization and Introduction to ggplot2 I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in: inc <- read.csv("https://raw.githubusercontent.com/charleyferrari/CUNY_DATA_608/master/module1/Data/inc5000_data.csv", header= TRUE) And lets preview this...

1934 sym R (6726 sym/20 pcs) 3 img

621: hw3

18.03.2022

621-hw3 Jie Zou, Euclid Zhang, Leticia Cancel, Joseph Connolly, Chi Pong 2022-03-18 Explore Data stats The data set has 13 variables and 466 cases. # load dataset data <- read.csv('crime-training-data_modified.csv') str(data) ## 'data.frame': 466 obs. of 13 variables: ## $ zn : num 0 0 0 30 0 0 0 0 0 80 ... ## $ indus : num 19.58 1...

2846 sym R (18068 sym/50 pcs) 13 img

Blog5: Model Selection

20.05.2022

Blog5: Model Selection Jie Zou 2022-05-20 Data Structure After Feature Engineering ## Rows: 7,907 ## Columns: 13 ## $ year <int> 2014, 2014, 2006, 2010, 2007, 2017, 2007, 2001, 2011, 20… ## $ selling_price <int> 450000, 370000, 158000, 225000, 130000, 440000, 96000, 4… ## $ km_driven <int> 145500, 120000, 140000, 127000, 120000,...

1293 sym R (7438 sym/11 pcs) 3 img

Blog4: Date Classification

19.05.2022

Blog4: Date Classification Jie Zou 2022-05-19 Data Structure data dimension: (898, 35) no missing value only target variable is character type, the rest are numeric/integer type ## 'data.frame': 898 obs. of 35 variables: ## $ AREA : int 422163 338136 526843 416063 347562 408953 451414 382636 546063 420044 ... ## $ PERIMETER :...

1306 sym R (9141 sym/7 pcs) 5 img

Blog2: Feature Engineering and Tidymodel

16.05.2022

Blog2: Tidymodels Jie Zou 2022-05-20 My Thought Before working in Final project, I did not realize that feature engineering is important. After I read some stat documents, I found out that feature engineering is super useful to extract/create potentials to enhance future analysis and modeling. In addition, I’ve never use tidymodel before, I wo...

4136 sym R (4925 sym/12 pcs) 3 img

Data 621 - Final Project v4

14.05.2022

#setwd("/Users/dpong/Data 621/Final_Project/Datasets") setwd("~/Library/CloudStorage/OneDrive-CityUniversityofNewYork/621/final_churn_modeling") df <- read.csv("sparkify-medium.csv", stringsAsFactors = FALSE, row.names=1) Data Souce In this analysis, we will utilize the sparkify data set created by Udacity, an educational organization. Sparkify ...

14504 sym R (16529 sym/41 pcs) 6 img

blog1: StreamGraph

11.05.2022

Streamgraph Usgae Jie Zou 2022-05-11 My Thoughts The first time that I’ve heard about Stream-graph was in 608 class where we were doing live coding demo to figure out how to use d3 to achieve plotting data in the webpage few days ago. I was confused at that time because compared to other intuitive graphs, stream-graph was not easy to interpret...

2080 sym R (17238 sym/3 pcs) 2 img

Data 621 - Final Project v2

14.05.2022

library("stringr") library("dplyr") library("tidyr") #library("arm") library("pROC") library("car") library("caret") library("reshape2") library("patchwork") Note: No user is using multiple device. Users stay in the same location. All songs are finished before going to the next song. There is no “remove from Playlist” record. Only users wh...

8080 sym R (22501 sym/40 pcs) 6 img