Publications by CHUN-LI HOU

Final Report

26.11.2020

Environment & Data 1. Environment Setting if(!require("pacman")) install.packages("pacman") pacman::p_load(dplyr, caret, GGally, Hmisc, broom, tidyr, car, e1071, rpart, rpart.plot, rattle, randomForest, h2o, forecast) options(digits = 3) theme_set(theme_minimal()) set.seed(123) 2. Data Preprocessing a. Data Loading df.0 = r...

891 sym R (19945 sym/63 pcs) 13 img

Natural Language Processing

03.11.2020

Package importing pacman::p_load(dplyr, broom, caTools, ggplot2, gridExtra, caret) Dataset importing dataset = read.delim('Restaurant_Reviews.tsv', quote = '', stringsAsFactors = F) dataset.ori = read.delim('Restaurant_Reviews.tsv', quote = '', stri...

130 sym R (7085 sym/10 pcs) 3 img

Classification

03.11.2020

Package importing pacman::p_load(dplyr, broom, caTools, ggplot2, gridExtra, caret) Dataset importing dataset = read.csv('breast_cancer.csv') dataset = dataset[, -1] dataset = data.frame(lapply(dataset, as.factor)) dataset$Class = factor(dataset$Class, levels = c(2, 4), labels = c(0, 1)) glimpse(dataset) ## Rows: 683 ...

148 sym R (7839 sym/12 pcs) 3 img

ANN Regression

03.11.2020

Package importing pacman::p_load(dplyr, broom, caTools, ggplot2, gridExtra, caret, readxl, h2o, forecast) Data preprocessing # Importing dataset = read_excel('Folds5x2_pp.xlsx') # Scaling dataset[, -ncol(dataset)] = scale(dataset[, -ncol(dataset)]) # Partitioning set.seed(123) split = sam...

138 sym R (4526 sym/15 pcs)

Logistic Regression

03.11.2020

Logistic Regression Importing the libraries pacman::p_load(dplyr, broom, caTools, ggplot2, gridExtra, caret) Importing the dataset dataset = read.csv('breast_cancer.csv') glimpse(dataset) ## Rows: 683 ## Columns: 11 ## $ Sample.code.number <int> 1000025, 1002945, 1015425, 1016277, 101... ## $ Clump.Thickness <int> 5, 5,...

314 sym R (2818 sym/13 pcs)

Regression

03.11.2020

Library importing pacman::p_load(dplyr, broom, caTools, ggplot2, gridExtra, forecast) Dataset importing dataset = read.csv('realestate_roc.csv') dataset = dataset[, c(3, 4, 5, 8)] colnames(dataset) = c('age', 'd.mrt', 'n.store', 'price') glimpse(dataset) ## Rows: 414 ## Columns: 4 ## $ age <dbl> 32.0, 19.5, 13.3, 13.3, 5.0, 7.1, 34.5, 2...

129 sym R (3671 sym/10 pcs) 2 img

Visualizing Covid-19

03.11.2020

VISUALIZING COVID-19 About Course: Datacamp Date: 09202020 Project: Visualizing Covid-19 Reference: https://github.com/RamiKrispin/coronavirus 1. From epidemic to pandemic In December 2019, COVID-19 coronavirus was first identified in the Wuhan region of China. By March 11, 2020, the World Health Organization (WHO) categorized the COVID-19 outbr...

4595 sym R (4820 sym/16 pcs) 7 img

Transaction Analysis

14.01.2021

1 Objective The main objective of this analysis revolves around identifying key insights across each sector of the funneling process from viewing a product, adding, removing, and making a purchase to optimize conversion rates by providing critical recommendations and performance improvements. This is inclusive of uncovering business intelligence,...

4435 sym R (8915 sym/8 pcs) 6 img

Time Series Analysis

13.06.2021

1 Objective As for a prediction purpose, we can come with 2 model types. An explanatory model is a model that adds independent variables into consideration and is also based on the past historical dependent variable (in this case avocados prices) A time series model is a model that only uses past information (in this case avocadoes prices) In t...

7011 sym R (16641 sym/23 pcs) 15 img

Reservation Analysis

03.08.2021

1 Objective This project aims to investigate the differences between these two types of hotels on data and information available in the dataset. City hotel would be just a place for lodging, but resort hotel would be near coastal regions, rain forests, theme parks, or having in-house entertainment facilities for relaxation purposes. We will have ...

22326 sym R (85978 sym/83 pcs) 40 img 6 tbl