Publications by Rongen Zhang
Logistic Regression
Agenda Case Study: Titanic Survival Import Data Exploration and Data Preparation Build Logistic Model Predict with Logistic Model Evaluate Logistic Model Case Study - Titanic Survival Prediction The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsin...
2528 sym R (6712 sym/22 pcs) 1 img
Decision Tree Analysis
Agenda Decision Trees Collecting/Importing Data Exploring and Preparing the Data Partition data into training and test datasets Training a Decision Tree Model Evaluating Model Performance Improving Decision Tree Accuracy The C5.0 Decision Tree Algorithm There are numerous implementations of decision trees, but one of the most well-known impleme...
5624 sym R (62193 sym/38 pcs) 3 img
Multiple Regression Analysis
Agenda Simple Linear Regression & Multiple Linear Regression Importing Data Exploring and Preparing the Data Exploring relationships among features with correlation Visualizing relationships among features - scatterplot matrix Training a Model Evaluating Model Performance Adding Interaction Terms Linear Regression Linear regression is a statis...
6488 sym R (3286 sym/17 pcs) 3 img
Association Rules/Market Basket Analysis
Agenda Importing Data Data Exploration and Preparation Model Training Model Evaluation Market Basket Analysis Market basket analysis is used behind the scenes for the recommendation systems used in many brick-and-mortar and online retailers. The learned association rules indicate the combinations of items that are often purchased together. In thi...
6976 sym R (6724 sym/31 pcs) 5 img
Clustering Evaluation and Hierarchical Clustering
Learning Objectives Data Exploration and Preparation Standardization/Scaling Clustering Model Evaluation Hierarchical clustering Case Introduction - Customer Segmentation In business world, age and income are two crucial features that could be utilized to segment potential customers as they are highly influential for purchasing behaviors and capa...
3183 sym Python (5422 sym/22 pcs) 6 img
Cluster Analysis
Agenda Clustering Analysis K-Means Clustering Data Exploration and Preparation Model Training Model Evaluation Hierarchical Clustering CLustering Analysis Cluster analysis is a unsupervised machine learning method in data mining It is used for grouping a set of objects in such a way that objects in the same group (called a cluster) are more si...
6396 sym R (80984 sym/46 pcs) 6 img
BI Lab 2
Recap The RStudio environment Basic operators Basic data types Define variables variable_name <- some_value target <- data Agenda Advanced data types Vector List Matrix Data frame R Data Types Numeric Character Logic Factor Vector: a set of values, all of the same data type List: a set of values, potentially with different data types Matri...
4540 sym
Document
The tidyverse The tidyverse is a collection of R packages designed for data science. https://www.tidyverse.org/ Install the complete tidyverse with: install.packages("tidyverse") Load the tidyverse into the R environment library(tidyverse) ## ── Attaching packages ────────────────────────�...
5596 sym R (8416 sym/45 pcs) 1 img
Publish Document
Agenda Data input Data output Data summary Summarizing data with figures Getting data into R Importing data into R is fairly simple. We can use built-in functions or libraries to read data from the following sources: Text file (.txt) Comma-separated values (.csv) Excel (.xlsx or .xls) Database table Common data formatting Regardless the sourc...
6609 sym R (10560 sym/50 pcs) 13 img
Lab05 Flow Control
Flow control if-else for while function if-else An if statement consists of a logic condition (TRUE or FALSE) followed by one or more statements. # Template in words if(a logic condition) { Get inside the curly brackets and run this block when the condition is true } # Example x = 1 if(x == 1) { print("x equals 1") } ## [1] "x equals 1"...
3271 sym 1 tbl