Publications by quoc_nguyen
Document
Problem 1 #Read data p088_data <- read.table('P088.txt',header = TRUE) p088_data <- p088_data[,-c(1)] head(p088_data) ## Age HS Income Black Female Price Sales ## 1 27.0 41.3 2948 26.2 51.7 42.7 89.8 ## 2 22.9 66.7 4644 3.0 45.7 41.8 121.3 ## 3 26.3 58.1 3665 3.0 50.8 38.5 115.2 ## 4 29.1 39.9 2878 18.3 51.5 3...
6405 sym 3 img
Logistic Regression approaches in predicting Heart Failure.
Introduction Logistic regression analysis is a commonly used model when the response variable is a binary dependent variables. When there are more than two classes, we would prefer the multinomial logistic regression. Logistic regression not only can predict the possibility of one observation based on predictors, but it is also helpful in measu...
12471 sym R (27207 sym/66 pcs) 5 img 1 tbl
Data Science Job Analysis
Harvard Business Review calls the role of a data scientist as “the sexiest job of the 21st century”. Data Scientist becomes more and more popular in the job market in this decade, and this role is also essential across various field such as finance, banking, industry, or sports. Data scientist plays the important role to enhance their co...
5406 sym R (9627 sym/59 pcs) 12 img
Titanic Disaster Prediction
Introduction The sinking of Titanic in the early morning hours of 15 April 192 is considered as the deadliest ocean disaster. This caused the deaths of more than 1500 people. In this project, we will use Titanic Disaster dataset downloaded on Kaggle to analyze and apply machinea learning technique to predict the survivors based on passenges i...
9826 sym R (20663 sym/105 pcs) 21 img
Statistical Computing Project
#Load census dataset load("D:\\fall_2022\\MATH5640_Comp.Stat/census.RData") Part 1: Loading and cleaning Problem1. How many states are represented among the 74020 census tracts? How many counties? old_state<-unique(census$State_name) length(unique(census$State_name)) ## [1] 52 How many counties? length(unique(census$County_name)) ## [1] 1955...
6640 sym Python (14278 sym/71 pcs) 4 img
Mall_Customer_Segmentation
Introduction This project aims to study about the clustering analysis. I will use Mall Customer dataset from Kaggle for customer segmentation. This dataset contains the information of different mall customers, includes their gender, ages, annual income and their spending score. #Import dataset library("readxl") df<-read.csv("Mall_Customers.c...
5598 sym R (5509 sym/32 pcs) 12 img
Argentina Car Prices
This projects will learn and analyze the trending car market in Arghentina. We will use argentina_cars dataset from kaggle website to analyze library(readxl) df<-read.csv("argentina_cars.csv") head(df) ## money brand model year color fuel_type door gear ## 1 10350000 Toyota Corolla Cross 2022 Plateado Nafta...
4938 sym R (6372 sym/33 pcs) 6 img
Visualizing the Nobel prize winners
The nobel prize are the most well kwown prizes that are awarded yearly to those who contributed the most for the scientific of the world. Now there are 6 individuals selected in 6 different areas, which are Chemistry, Literature, Medicine, Physics, Economics and Peace. The first awared nobel prize was back in 1901, and at that time, it is jus...
4935 sym R (15717 sym/50 pcs) 6 img
linear regression problem
This projects aims to study and understand the linear regression in R. Advertising dataset will be used to do the regression analysis, and we will find out if are there any associations between the sales and three items, namely TV, radio and newspaper. Now, we load the dataset and study about their relationship library(readxl) df<-read.csv("D...
4712 sym R (4543 sym/30 pcs) 3 img
linear discriminant regression
In this project, I aim to study about the linear discriminant regression. We will use an available dataset Iris in R to learn about LDR. data("iris") head(iris) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 ...
4409 sym R (11195 sym/44 pcs) 4 img