Publications by Jimmy Ng

stock_prediction

13.05.2020

OVERVIEW Fears of the coronavirus crashed the stock market back in February, precisely beginning on February 24, 2020. The pandemic sent a shockwave to the global market and it still continuously wreaked havoc to humanity. The fears spread quickly and globally, e.g. over 70% of the world population was under lockdown at some point in March. Rece...

10612 sym R (48809 sym/3 pcs) 2 img 5 tbl

data_621_hw_5

09.05.2020

Overview In this homework assignment, you will explore, analyze and model a data set containing information on approximately 12,000 commercially available wines. The variables are mostly related to the chemical properties of the wine being sold. The response variable is the number of sample cases of wine that were purchased by wine distribution c...

3401 sym R (30341 sym/17 pcs) 7 img 2 tbl

data_621_blog_4

03.05.2020

Time Series Analysis This is a time series analysis tutorial on building an ARIMA model. We will be using a simple dataset from hotel revenue industry. The original sample data has only four columns, i.e. date, room_sold, adr, and revenue. The “date” is referred to the historical record of check-in date of a hotel in NYC, whereas the “room...

2740 sym R (11230 sym/15 pcs) 11 img

data_621_blog_3

27.04.2020

Odds Ratio Often times, we have to deal with a lot of unclean, missing categorical data, and our goal is to extract key insights, features from various attributes to come up with some sort of customer profile. For example, imagine you have a data set that has only five variables, i.e. var_a is subject Id, var_b is gender, var_c is education, var...

2552 sym R (5416 sym/7 pcs) 4 tbl

data_621_hw_4

25.04.2020

Introduction This homework exercise is to build a logistic regression model and a multiple regression model that will estimate the likelihood of car accident, and if so, we try to predict the cost when such accidents happen. We have two response variables, i.e. TARGET_FLAG and TARGET_AMT. TARGET_FLAG is a binary field where 1 is equal to crash, ...

3636 sym R (43733 sym/15 pcs) 5 img 5 tbl

Survival Model

22.04.2020

Survival model This is a simple tutorial of building a survival model for a subscription business. The use case is that a media company offers various subscription plan to its customers. Each plan is associated with different price and billing period, e.g. Annual, Month, Semi-Annual, Two-Year, etc. We need to infer from billing period associated...

5684 sym R (6349 sym/12 pcs) 1 img 3 tbl

data_621 - Logistics Regression

05.04.2020

load packages, data # load packages if(!require(pacman)){install.packages("pacman"); require(pacman)} ## Loading required package: pacman ## Warning: package 'pacman' was built under R version 3.6.2 packages <- c('tidyverse', 'glue', 'broom', 'MASS', 'caret', 'InformationValue', 'Hmisc', 'kableExtra', 'corrplot', 'ROCR') pacman::p_load(char ...

3099 sym R (15965 sym/8 pcs) 3 img 1 tbl

Classification metrics exercise

14.03.2020

df <- read.csv("classification-output-data.csv", header = TRUE) dfSubset <- df %>% dplyr::select(class, scored.class, scored.probability) rawConfusionMatrix <- with(dfSubset, table(scored.class, class)) rawConfusionMatrix ## class ## scored.class 0 1 ## 0 119 30 ## 1 5 27 The confusion matrix summar...

382 sym R (9113 sym/13 pcs) 2 img

Data 621 - Moneyball (hw1)

01.03.2020

library(tidyverse) ## Warning: package 'tidyverse' was built under R version 3.6.2 ## -- Attaching packages ----------------------------------------------------------- tidyverse 1.3.0 -- ## v ggplot2 3.2.1 v purrr 0.3.3 ## v tibble 2.1.3 v dplyr 0.8.4 ## v tidyr 1.0.2 v stringr 1.4.0 ## v readr 1.3.1 v forcats 0.4.0 ##...

359 sym R (44870 sym/98 pcs) 4 img

logistic_regression_tutorial

18.04.2020

Logistic Regression Logistic regression is a very common tool to solve classification problem. Given a binary outcome, we would like to classify whether an event would occur based on a set of quantitative or qualitative variables. In this blog post, we would like to use a public dataset to classify loan status based on various social demographic ...

1036 sym R (7910 sym/9 pcs) 2 tbl