Publications by Jimmy Ng
stock_prediction
OVERVIEW Fears of the coronavirus crashed the stock market back in February, precisely beginning on February 24, 2020. The pandemic sent a shockwave to the global market and it still continuously wreaked havoc to humanity. The fears spread quickly and globally, e.g. over 70% of the world population was under lockdown at some point in March. Rece...
10612 sym R (48809 sym/3 pcs) 2 img 5 tbl
data_621_hw_5
Overview In this homework assignment, you will explore, analyze and model a data set containing information on approximately 12,000 commercially available wines. The variables are mostly related to the chemical properties of the wine being sold. The response variable is the number of sample cases of wine that were purchased by wine distribution c...
3401 sym R (30341 sym/17 pcs) 7 img 2 tbl
data_621_blog_4
Time Series Analysis This is a time series analysis tutorial on building an ARIMA model. We will be using a simple dataset from hotel revenue industry. The original sample data has only four columns, i.e. date, room_sold, adr, and revenue. The “date” is referred to the historical record of check-in date of a hotel in NYC, whereas the “room...
2740 sym R (11230 sym/15 pcs) 11 img
data_621_blog_3
Odds Ratio Often times, we have to deal with a lot of unclean, missing categorical data, and our goal is to extract key insights, features from various attributes to come up with some sort of customer profile. For example, imagine you have a data set that has only five variables, i.e. var_a is subject Id, var_b is gender, var_c is education, var...
2552 sym R (5416 sym/7 pcs) 4 tbl
data_621_hw_4
Introduction This homework exercise is to build a logistic regression model and a multiple regression model that will estimate the likelihood of car accident, and if so, we try to predict the cost when such accidents happen. We have two response variables, i.e. TARGET_FLAG and TARGET_AMT. TARGET_FLAG is a binary field where 1 is equal to crash, ...
3636 sym R (43733 sym/15 pcs) 5 img 5 tbl
Survival Model
Survival model This is a simple tutorial of building a survival model for a subscription business. The use case is that a media company offers various subscription plan to its customers. Each plan is associated with different price and billing period, e.g. Annual, Month, Semi-Annual, Two-Year, etc. We need to infer from billing period associated...
5684 sym R (6349 sym/12 pcs) 1 img 3 tbl
data_621 - Logistics Regression
load packages, data # load packages if(!require(pacman)){install.packages("pacman"); require(pacman)} ## Loading required package: pacman ## Warning: package 'pacman' was built under R version 3.6.2 packages <- c('tidyverse', 'glue', 'broom', 'MASS', 'caret', 'InformationValue', 'Hmisc', 'kableExtra', 'corrplot', 'ROCR') pacman::p_load(char ...
3099 sym R (15965 sym/8 pcs) 3 img 1 tbl
Classification metrics exercise
df <- read.csv("classification-output-data.csv", header = TRUE) dfSubset <- df %>% dplyr::select(class, scored.class, scored.probability) rawConfusionMatrix <- with(dfSubset, table(scored.class, class)) rawConfusionMatrix ## class ## scored.class 0 1 ## 0 119 30 ## 1 5 27 The confusion matrix summar...
382 sym R (9113 sym/13 pcs) 2 img
Data 621 - Moneyball (hw1)
library(tidyverse) ## Warning: package 'tidyverse' was built under R version 3.6.2 ## -- Attaching packages ----------------------------------------------------------- tidyverse 1.3.0 -- ## v ggplot2 3.2.1 v purrr 0.3.3 ## v tibble 2.1.3 v dplyr 0.8.4 ## v tidyr 1.0.2 v stringr 1.4.0 ## v readr 1.3.1 v forcats 0.4.0 ##...
359 sym R (44870 sym/98 pcs) 4 img
logistic_regression_tutorial
Logistic Regression Logistic regression is a very common tool to solve classification problem. Given a binary outcome, we would like to classify whether an event would occur based on a set of quantitative or qualitative variables. In this blog post, we would like to use a public dataset to classify loan status based on various social demographic ...
1036 sym R (7910 sym/9 pcs) 2 tbl