Publications by Yunting Chiu
biomass data for linear regression
1 Exercise 1 For each of parts (a) through (d), indicate whether we would generally expect the performance of a flexible statistical learning method to be better or worse than an inflexible method. Justify your answer. Before we answering the questions, we should know a inflexible method is a simple method; a flexible method is a complex method. ...
9803 sym R (3994 sym/18 pcs)
LDA, QDA, KNN Models Implementation
In this assignment, I will be using Tidymodels instead of base R to do coding. 1 Exercise 1 (10 points) Suppose we collect data for a group of students in a statistics class with variables \(X_1\) = hours studied, \(X_2\) = undergrad GPA, and \(Y\) = receive an A. We fit a logistic regression and produce estimated coefficient, \(\hat{\beta}_0=-6...
14211 sym R (18514 sym/95 pcs) 10 img
cross-validation and bootstraps
Libraries and Data library(tidymodels) # Includes the workflows package ## ── Attaching packages ────────────────────────────────────── tidymodels 0.1.1 ── ## ✓ broom 0.7.6 ✓ recipes 0.1.14 ## ✓ dials 0.0.9 ✓ rsample 0.0.9 ## ✓ dpl...
1862 sym R (9015 sym/36 pcs)
K-means Clustering
Palmer Station Penguin Data We will be using the palmerpenguins data set for this lab. We will also be needing to load the broom package library(palmerpenguins) library(broom) library(tidymodels) ## ── Attaching packages ────────────────────────────────────── tidymode...
2759 sym R (11067 sym/44 pcs) 6 img
Statistical Machine Learning for Bitcoin Prediction
1. Abstract As Wall Street giants, retail investors, and aspiring cryptocurrency trailblazers continue to flood the cryptocurrency market, the ability to predict the volatility of cryptocurrency stocks has proven to be increasingly invaluable. In this report, we detail our methodology that applies statistical machine learning techniques to predic...
21643 sym R (35761 sym/154 pcs) 21 img 1 tbl
SVM Models
Khan Gene Data SVM transforms our data using a technique known as the kernel trick, and then finds an optimal boundary between the possible outputs based on these transformations. In this lab, we will explore how to use SVM models. We will start by using the Khan data set from the ISLR package. library(ISLR) library(tidymodels) ## Registered S3 m...
1755 sym R (6635 sym/41 pcs) 5 img
Shrinkage and Hyperparameter Tuning
This week we will talk about shrinkage and hyperparameter tuning. We will use the Hitters data set from the ISLR library. It can be loaded using the following code The vast majority of variables are numerical, with the remainder being factors. library(tidyverse) library(tidymodels) library(ISLR) data("Hitters") Hitters %>% str() ## 'data.frame'...
2949 sym R (10294 sym/45 pcs) 3 img
Tuning Hyperparameter
In this assignment, I will be using Tidymodels framework instead of base R. 1 Exercise 1 (10 points) Explain the assumptions we are making when performing Principle Component Analysis (PCA). What happens when these assumptions are violated? Sampling adequacy: Large enough sample data are required to perform PCA. Otherwise, PCA can’t be correc...
12014 sym R (12885 sym/54 pcs) 5 img
Decision Trees and Random Forests
We will be using this lab to explore decision trees and random forests using the palmerpenguins package. We will also use a couple of other packages such as rpart.plot, rpart, ranger, and vip. library(tidymodels) library(palmerpenguins) Split the Data penguins_split <- initial_split(penguins) set.seed(1234) penguins_train <- training(penguins_sp...
1772 sym R (3767 sym/27 pcs) 4 img
Spline
We will use the ames data set from the modeldata library. It can be loaded using the following code Ames Housing Data library(tidymodels) library(tidyverse) data("ames") ames ## # A tibble: 2,930 x 74 ## MS_SubClass MS_Zoning Lot_Frontage Lot_Area Street Alley Lot_Shape ## * <fct> <fct> <dbl> <int> <fct...
1645 sym R (14869 sym/51 pcs) 3 img