Publications by quoc_nguyen

multinomial_logistic_regression

23.12.2022

This project aims to apply Multinomial Logistic Regression to classify more than 2 classes. The available Iris data will be used for this classficiation problem data("iris") head(iris) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 ...

1277 sym R (2262 sym/13 pcs)

Ridge_regression

26.12.2022

This project aims to study about the Ridge Regression. library(ISLR2) ## Warning: package 'ISLR2' was built under R version 4.2.2 library(tidyverse) ## Warning: package 'tidyverse' was built under R version 4.2.2 ## ── Attaching packages ───────────────────────────────────�...

3242 sym R (14772 sym/86 pcs) 4 img

Lasso_regression

26.12.2022

In this problem, I will learn about the lasso regression, and how to apply this technique for feature selecting. library(ISLR2) ## Warning: package 'ISLR2' was built under R version 4.2.2 library(ggplot2) ## Warning: package 'ggplot2' was built under R version 4.2.2 library(magrittr) library(tidyverse) ## Warning: package 'tidyverse' was built ...

1431 sym R (5427 sym/39 pcs) 1 img

Logistic Regression

30.12.2022

Logistic regression is one of the common classification method. In this project, I will study about this method and explain why logistic regression is more approriate than the linear regression for the classfication problem. Why is not Linear Regression. We know linear regression takes the form as \(f(x)=w^Tx\) where \(w\) is the vector of th...

3692 sym R (4370 sym/23 pcs) 3 img

Principal Component Analysis

01.01.2023

In this project, I will study about the PCA and how to apply it in R. Principal component analysis (PCA) is know as the method to reduce the dimension. It is useful when the dataset contains a lot of predictors, and not all of them are important for data analysis. Therefore, PCA will find the most important variable and reduce the dimension ...

3124 sym R (3316 sym/37 pcs) 5 img

Birth_analysis

02.01.2023

In this project, we will analyze the dataset of birth information on babies born in USA during 2006. Each record is one birth. Datset is used as an example in the book “R in a Nutshell” from O’Reilly Media. load("C:/Users/Quoc Nguyen/Downloads/births2006.smpl.rda") df<-births2006.smpl dim(df) ## [1] 427323 13 str(df) ## 'data.frame'...

3644 sym R (3579 sym/25 pcs) 9 img