Publications by Mustafa Arslan
Logistic Regression 3
Introduction Data: This data set is called Heart Failure Prediction and available on Kaggle.com. It has 299 observation of 13 variables. Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worlwide. Heart failure is a common event caused b...
3901 sym R (15306 sym/53 pcs)
Model Comparison 2
Introduction In this section, I compare the common machine learning models by solving the question 11 at chapter 4 from the An Introduction to Statistical Learning Book. Data: Auto data set: A data frame with 392 observations on the following 9 variables. mpg:miles per gallon cylinders: Number of cylinders between 4 and 8 displacement:Engine dis...
3571 sym R (8247 sym/63 pcs) 4 img
Resampling Methods 1
Introduction In this section, I am going to implement Cross Validation methods below on Heart Failure Prediction data set and compare the test error rates. Validation test approach Leave One Out Cross Validation (LOOCV) Method K-Fold Cross Validation Method Bootstrap Method Data: This data set is called Heart Failure Prediction and available o...
2118 sym R (7384 sym/41 pcs)
Resampling Methods 2
Introduction In this section, I am going to implement Cross Validation and Bootstrap methods below on Heart Failure Prediction data set using CARET library and make prediction to estimate Naive Bayes. Data Split Leave One Out Cross Validation (LOOCV) Method K-Fold Cross Validation Method 4.Repeated K-Fold Cross Validation Method Bootstrap Met...
1354 sym R (6778 sym/37 pcs)
Decision Tree 1
Introduction In this project, I am going to implement decision tree models on Carseats data set. Decision Tree: Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population int...
4094 sym R (11014 sym/74 pcs) 11 img 1 tbl
CSU R Course Notes 2
1 Data Frames A data frame is one of the most commonly used data structure in R (see the tibble object in the Data Wrangling lab), they are just a list of equal length vectors. Each vector is treated as a column and elements of the vectors as rows. Most often a data frame will be constructed by reading in from a file, but we can also create them ...
24168 sym R (7890 sym/232 pcs) 3 img 2 tbl
CSU R Course Notes 3
1 Setup & Configuration The code below will load the libraries you will need for this tutorial. If you do not have one, please install with install.packages("pkg"): options(warnPartialMatchArgs = FALSE) # don't want these warnings options(width = 100) library(tibble) # special type of data frame library(magrittr) # pipes libr...
40634 sym R (11895 sym/119 pcs) 16 img 3 tbl
CSU R Course Notes 4
Load necessary libraries options(warnPartialMatchArgs = FALSE) # don't want these warnings library(magrittr) # pipes library(tibble) # tibbles library(dplyr) # data wrangling library(purrr) # iteration library(ggplot2) # tidy plotting library(broom) # summarize model objects uniformly library(yardstick)...
16625 sym R (14657 sym/129 pcs) 3 img
CSU R Course Notes 5
Load necessary libraries options(warnPartialMatchArgs = FALSE) # don't want these warnings library(magrittr) # pipes library(tibble) # tibbles library(dplyr) # data wrangling library(boot) # for `cv.glm` library(purrr) # iteration library(ggplot2) # tidy plotting library(ISLR) # Auto data set ...
9391 sym R (6232 sym/40 pcs) 1 img
Decision Tree 2
Introduction In this project, I am going to implement Dcision tree models on Carseats data set. Data: Orange Juice(OJ) data frame with 1070 observations on the following 18 variables. Purchase:A factor with levels CH and MM indicating whether the customer purchased Citrus Hill or Minute Maid Orange Juice WeekofPurchase: Week of purchase StoreID:...
4031 sym R (14384 sym/71 pcs) 5 img