Publications by Jie Zou
605 final_part2
605: final part_2 Jie Zou 2021-12-15 library(ggplot2) library(tidyverse) library(dplyr) library(tidyr) library(plotly) https://www.kaggle.com/c/house-prices-advanced-regression-techniques take a glance of data, I see that there are NAs in some numerical and categorical variable. Combine with the description of data, NA is meaningful in categoric...
2833 sym R (77351 sym/75 pcs) 13 img
605: final
605:final Jie Zou 2021-12-16 library(MASS) Playing with PageRank From matrix A According to the graph we used previously in the course note. # matrix A is A1 = matrix(c(0,1/2,1/2,0,0,0, 0,0,0,0,0,0, 1/3,1/3,0,0,1/3,0, 0,0,0,0,1/2,1/2, 0,0,0,1/2,0,1/2, 0,0,0,1,0,0), byrow = T, ...
2112 sym R (225595 sym/62 pcs) 23 img
608: hw1
Principles of Data Visualization and Introduction to ggplot2 I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in: inc <- read.csv("https://raw.githubusercontent.com/charleyferrari/CUNY_DATA_608/master/module1/Data/inc5000_data.csv", header= TRUE) And lets preview this...
1934 sym R (6726 sym/20 pcs) 3 img
621: hw3
621-hw3 Jie Zou, Euclid Zhang, Leticia Cancel, Joseph Connolly, Chi Pong 2022-03-18 Explore Data stats The data set has 13 variables and 466 cases. # load dataset data <- read.csv('crime-training-data_modified.csv') str(data) ## 'data.frame': 466 obs. of 13 variables: ## $ zn : num 0 0 0 30 0 0 0 0 0 80 ... ## $ indus : num 19.58 1...
2846 sym R (18068 sym/50 pcs) 13 img
Blog5: Model Selection
Blog5: Model Selection Jie Zou 2022-05-20 Data Structure After Feature Engineering ## Rows: 7,907 ## Columns: 13 ## $ year <int> 2014, 2014, 2006, 2010, 2007, 2017, 2007, 2001, 2011, 20… ## $ selling_price <int> 450000, 370000, 158000, 225000, 130000, 440000, 96000, 4… ## $ km_driven <int> 145500, 120000, 140000, 127000, 120000,...
1293 sym R (7438 sym/11 pcs) 3 img
Blog4: Date Classification
Blog4: Date Classification Jie Zou 2022-05-19 Data Structure data dimension: (898, 35) no missing value only target variable is character type, the rest are numeric/integer type ## 'data.frame': 898 obs. of 35 variables: ## $ AREA : int 422163 338136 526843 416063 347562 408953 451414 382636 546063 420044 ... ## $ PERIMETER :...
1306 sym R (9141 sym/7 pcs) 5 img
Blog2: Feature Engineering and Tidymodel
Blog2: Tidymodels Jie Zou 2022-05-20 My Thought Before working in Final project, I did not realize that feature engineering is important. After I read some stat documents, I found out that feature engineering is super useful to extract/create potentials to enhance future analysis and modeling. In addition, I’ve never use tidymodel before, I wo...
4136 sym R (4925 sym/12 pcs) 3 img
Data 621 - Final Project v4
#setwd("/Users/dpong/Data 621/Final_Project/Datasets") setwd("~/Library/CloudStorage/OneDrive-CityUniversityofNewYork/621/final_churn_modeling") df <- read.csv("sparkify-medium.csv", stringsAsFactors = FALSE, row.names=1) Data Souce In this analysis, we will utilize the sparkify data set created by Udacity, an educational organization. Sparkify ...
14504 sym R (16529 sym/41 pcs) 6 img
blog1: StreamGraph
Streamgraph Usgae Jie Zou 2022-05-11 My Thoughts The first time that I’ve heard about Stream-graph was in 608 class where we were doing live coding demo to figure out how to use d3 to achieve plotting data in the webpage few days ago. I was confused at that time because compared to other intuitive graphs, stream-graph was not easy to interpret...
2080 sym R (17238 sym/3 pcs) 2 img
Data 621 - Final Project v2
library("stringr") library("dplyr") library("tidyr") #library("arm") library("pROC") library("car") library("caret") library("reshape2") library("patchwork") Note: No user is using multiple device. Users stay in the same location. All songs are finished before going to the next song. There is no “remove from Playlist” record. Only users wh...
8080 sym R (22501 sym/40 pcs) 6 img