Publications by CHUN-LI HOU
Final Report
Environment & Data 1. Environment Setting if(!require("pacman")) install.packages("pacman") pacman::p_load(dplyr, caret, GGally, Hmisc, broom, tidyr, car, e1071, rpart, rpart.plot, rattle, randomForest, h2o, forecast) options(digits = 3) theme_set(theme_minimal()) set.seed(123) 2. Data Preprocessing a. Data Loading df.0 = r...
891 sym R (19945 sym/63 pcs) 13 img
Natural Language Processing
Package importing pacman::p_load(dplyr, broom, caTools, ggplot2, gridExtra, caret) Dataset importing dataset = read.delim('Restaurant_Reviews.tsv', quote = '', stringsAsFactors = F) dataset.ori = read.delim('Restaurant_Reviews.tsv', quote = '', stri...
130 sym R (7085 sym/10 pcs) 3 img
Classification
Package importing pacman::p_load(dplyr, broom, caTools, ggplot2, gridExtra, caret) Dataset importing dataset = read.csv('breast_cancer.csv') dataset = dataset[, -1] dataset = data.frame(lapply(dataset, as.factor)) dataset$Class = factor(dataset$Class, levels = c(2, 4), labels = c(0, 1)) glimpse(dataset) ## Rows: 683 ...
148 sym R (7839 sym/12 pcs) 3 img
ANN Regression
Package importing pacman::p_load(dplyr, broom, caTools, ggplot2, gridExtra, caret, readxl, h2o, forecast) Data preprocessing # Importing dataset = read_excel('Folds5x2_pp.xlsx') # Scaling dataset[, -ncol(dataset)] = scale(dataset[, -ncol(dataset)]) # Partitioning set.seed(123) split = sam...
138 sym R (4526 sym/15 pcs)
Logistic Regression
Logistic Regression Importing the libraries pacman::p_load(dplyr, broom, caTools, ggplot2, gridExtra, caret) Importing the dataset dataset = read.csv('breast_cancer.csv') glimpse(dataset) ## Rows: 683 ## Columns: 11 ## $ Sample.code.number <int> 1000025, 1002945, 1015425, 1016277, 101... ## $ Clump.Thickness <int> 5, 5,...
314 sym R (2818 sym/13 pcs)
Regression
Library importing pacman::p_load(dplyr, broom, caTools, ggplot2, gridExtra, forecast) Dataset importing dataset = read.csv('realestate_roc.csv') dataset = dataset[, c(3, 4, 5, 8)] colnames(dataset) = c('age', 'd.mrt', 'n.store', 'price') glimpse(dataset) ## Rows: 414 ## Columns: 4 ## $ age <dbl> 32.0, 19.5, 13.3, 13.3, 5.0, 7.1, 34.5, 2...
129 sym R (3671 sym/10 pcs) 2 img
Visualizing Covid-19
VISUALIZING COVID-19 About Course: Datacamp Date: 09202020 Project: Visualizing Covid-19 Reference: https://github.com/RamiKrispin/coronavirus 1. From epidemic to pandemic In December 2019, COVID-19 coronavirus was first identified in the Wuhan region of China. By March 11, 2020, the World Health Organization (WHO) categorized the COVID-19 outbr...
4595 sym R (4820 sym/16 pcs) 7 img
Transaction Analysis
1 Objective The main objective of this analysis revolves around identifying key insights across each sector of the funneling process from viewing a product, adding, removing, and making a purchase to optimize conversion rates by providing critical recommendations and performance improvements. This is inclusive of uncovering business intelligence,...
4435 sym R (8915 sym/8 pcs) 6 img
Time Series Analysis
1 Objective As for a prediction purpose, we can come with 2 model types. An explanatory model is a model that adds independent variables into consideration and is also based on the past historical dependent variable (in this case avocados prices) A time series model is a model that only uses past information (in this case avocadoes prices) In t...
7011 sym R (16641 sym/23 pcs) 15 img
Reservation Analysis
1 Objective This project aims to investigate the differences between these two types of hotels on data and information available in the dataset. City hotel would be just a place for lodging, but resort hotel would be near coastal regions, rain forests, theme parks, or having in-house entertainment facilities for relaxation purposes. We will have ...
22326 sym R (85978 sym/83 pcs) 40 img 6 tbl