Publications by Miao Yu
Exploratory Data Analysis and Visualization - gganimate
Load libraries library(tidyverse) library(gganimate) library(gifski) Introduction In this module, we will learn how to create animated visualizations with gganimate package. gganimate extends the grammar of graphics as implemented by ggplot2 to include the description of animation. The help documentation of gganimate can be found at https://ggan...
3595 sym R (2699 sym/8 pcs) 7 img
Exploratory Data Analysis and Visualization - Lecture 16 Geospatial Data 1 - Vectors
Load Libraries In this class, we will study visualization of geospatial data. Below are the packages that need to be installed and loaded. library(sf) library(ggplot2) library(dplyr) library(tidyr) library(scales) library(RColorBrewer) library(units) library(cowplot) Reference Textbook and Data Downloaning...
12345 sym R (9045 sym/45 pcs) 13 img
Exploratory Data Analysis and Visualization - Lecture 15 Lists, Loop and Functionals
0. Introduction In this module, we will learn more details about R data structure and functions to strength our basics. The topics covered include: Lists Functions and Conditional Statements (self-study in another separate notebook) Loops Functionals 1. List We have learned how to create vectors that contain multiple values of the same type with...
19006 sym Python (26407 sym/194 pcs) 5 img
Exploratory Data Analysis and Visualization - Lecture 14 Dates and Time
Introduction For obvious reasons, dates and times are a very common and important type of data. For example, in flights we have information about the scheduled departure time, actual departure time, scheduled arrival time, and actual arrival time. We also have a time_hour column to record the scheduled date and hour (but with minutes ignored) in a...
11992 sym R (20953 sym/151 pcs) 5 img
Exploratory Data Analysis and Visualization - Functions in R
Functions So far we have nearly always used built-in functions to fulfill tasks. However, sometimes built-in functions are not enough to do some specific things that we hope to do. In the last lecture, there was such an example. make_datetime_100 <- function(year, month, day, time, tz = "UTC") { make_datetime(year, month, day, time %/% 100, time...
3902 sym 1 img
Exploratory Data Analysis and Visualization - Lecture 13 Factors
Load Libraries library(tidyverse) library(nycflights13) Introduction In R, factors are the data type used to work with categorical variables, variables that have a fixed and known set of possible values. They are also useful when you want to display character vectors in a non-alphabetical order. Historically, factors were much easier to work wit...
8529 sym R (10236 sym/78 pcs) 10 img
Exploratory Data Analysis and Visualization - Lecture 12 String Basics and Regular Expressions
Load Libraries library(tidyverse) library(nycflights13) Introduction Strings are collection of characters, which are used to store “text data”, or any data format in terms of texts. It is very important to be skilled at handling strings in data science. In this module, we will study Basics in string manipulation in R Basics in regular expre...
20620 sym R (18887 sym/154 pcs) 2 tbl
Exploratory Data Analysis and Visualization - Lecture 11 Data Import
Load Libraries library(tidyverse) Introduction The first step of working on any data set in RStudio is of course, importing the data set. Due to various ways of collecting, transferring and storing data, there are many different data formats. Regarding tabular data, below are a few commonly used data format: csv (comma separated value, a text f...
11557 sym R (9852 sym/76 pcs) 1 img
COS531 Modern Applied Statistical Learning Lab 10: Classification - Logistic Regression and Linear Discriminant Analysis
Introduction This report explores logistic regression and linear discriminant analysis (LDA) to predict credit card default using the Default dataset from ISLR2. We will: Load and explore the dataset. Perform logistic regression and LDA. Evaluate model performance using ROC curves and AUC. Load Libraries library(ISLR2) library(tidyverse) library(...
2132 sym R (6512 sym/51 pcs) 11 img
COS531 Modern Applied Statistical Learning Lab 9: Analysis of Variance (ANOVA)
Introduction This document provides examples of how to apply ANOVA in different statistical settings: One-way ANOVA test: Comparing the means of multiple groups. Testing the overall significance of a regression model. Comparing nested regression models using ANOVA. Example 1: One-Way ANOVA (Testing Mean Differences Between Groups) Data Overview ...
2165 sym R (1821 sym/7 pcs)