Publications by Yunting Chiu

biomass data for linear regression

23.05.2021

1 Exercise 1 For each of parts (a) through (d), indicate whether we would generally expect the performance of a flexible statistical learning method to be better or worse than an inflexible method. Justify your answer. Before we answering the questions, we should know a inflexible method is a simple method; a flexible method is a complex method. ...

9803 sym R (3994 sym/18 pcs)

LDA, QDA, KNN Models Implementation

29.05.2021

In this assignment, I will be using Tidymodels instead of base R to do coding. 1 Exercise 1 (10 points) Suppose we collect data for a group of students in a statistics class with variables \(X_1\) = hours studied, \(X_2\) = undergrad GPA, and \(Y\) = receive an A. We fit a logistic regression and produce estimated coefficient, \(\hat{\beta}_0=-6...

14211 sym R (18514 sym/95 pcs) 10 img

cross-validation and bootstraps

04.06.2021

Libraries and Data library(tidymodels) # Includes the workflows package ## ── Attaching packages ────────────────────────────────────── tidymodels 0.1.1 ── ## ✓ broom 0.7.6 ✓ recipes 0.1.14 ## ✓ dials 0.0.9 ✓ rsample 0.0.9 ## ✓ dpl...

1862 sym R (9015 sym/36 pcs)

K-means Clustering

06.06.2021

Palmer Station Penguin Data We will be using the palmerpenguins data set for this lab. We will also be needing to load the broom package library(palmerpenguins) library(broom) library(tidymodels) ## ── Attaching packages ────────────────────────────────────── tidymode...

2759 sym R (11067 sym/44 pcs) 6 img

Statistical Machine Learning for Bitcoin Prediction

25.06.2021

1. Abstract As Wall Street giants, retail investors, and aspiring cryptocurrency trailblazers continue to flood the cryptocurrency market, the ability to predict the volatility of cryptocurrency stocks has proven to be increasingly invaluable. In this report, we detail our methodology that applies statistical machine learning techniques to predic...

21643 sym R (35761 sym/154 pcs) 21 img 1 tbl

SVM Models

22.06.2021

Khan Gene Data SVM transforms our data using a technique known as the kernel trick, and then finds an optimal boundary between the possible outputs based on these transformations. In this lab, we will explore how to use SVM models. We will start by using the Khan data set from the ISLR package. library(ISLR) library(tidymodels) ## Registered S3 m...

1755 sym R (6635 sym/41 pcs) 5 img

Shrinkage and Hyperparameter Tuning

14.06.2021

This week we will talk about shrinkage and hyperparameter tuning. We will use the Hitters data set from the ISLR library. It can be loaded using the following code The vast majority of variables are numerical, with the remainder being factors. library(tidyverse) library(tidymodels) library(ISLR) data("Hitters") Hitters %>% str() ## 'data.frame'...

2949 sym R (10294 sym/45 pcs) 3 img

Tuning Hyperparameter

16.06.2021

In this assignment, I will be using Tidymodels framework instead of base R. 1 Exercise 1 (10 points) Explain the assumptions we are making when performing Principle Component Analysis (PCA). What happens when these assumptions are violated? Sampling adequacy: Large enough sample data are required to perform PCA. Otherwise, PCA can’t be correc...

12014 sym R (12885 sym/54 pcs) 5 img

Decision Trees and Random Forests

17.06.2021

We will be using this lab to explore decision trees and random forests using the palmerpenguins package. We will also use a couple of other packages such as rpart.plot, rpart, ranger, and vip. library(tidymodels) library(palmerpenguins) Split the Data penguins_split <- initial_split(penguins) set.seed(1234) penguins_train <- training(penguins_sp...

1772 sym R (3767 sym/27 pcs) 4 img

Spline

19.06.2021

We will use the ames data set from the modeldata library. It can be loaded using the following code Ames Housing Data library(tidymodels) library(tidyverse) data("ames") ames ## # A tibble: 2,930 x 74 ## MS_SubClass MS_Zoning Lot_Frontage Lot_Area Street Alley Lot_Shape ## * <fct> <fct> <dbl> <int> <fct...

1645 sym R (14869 sym/51 pcs) 3 img