Publications by Mustafa Arslan
Probability Simulations 2
Bernoulli and Binomial Distributions The sample space of Bernoulli experiment has only two possible outcomes: Failure or Success. pbinom(y=r,n,p) returns the probability P(Y<=r) where Y ~ Binom(n,p), where r= trial, n= size and p= probability Question 1. Fourty percent of workers in Newtown support tax reform. Take a random sample of twelve suppo...
6191 sym R (4352 sym/97 pcs)
Probability Simulations 1
Question 1: In an office, there are 10 sales person,four women and six men. Three women and two men will be chosen at random. Use simulations in R to approximate the probability. nloop=100000 sample1 = c(rep("W",4),rep("M",6)) sample1 ## [1] "W" "W" "W" "W" "M" "M" "M" "M" "M" "M" count = 0 for (iloop in 1:nloop){ MW = sample(sample1,5, r...
5339 sym R (11080 sym/86 pcs)
Support Vector Machine 1
Introduction In this project, I am going to evaluate College data set using Regularization methods and make a prediction on new data. Support vector machines are powerful machine learning techniques and are used for supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis Th...
4265 sym R (13340 sym/83 pcs) 1 tbl
CSU R Course Notes 1
1 Setup library(tibble) # special type of data frame library(magrittr) # pipes library(dplyr) # data manipulation library(ggplot2) # pretty plots library(tidyr) # reshape data frames; mostly for ggplots 2 Introduction This is a very simple streamlined set of exmple code to describe some basic summ...
9028 sym R (6613 sym/35 pcs) 4 img
Regularization Methods 1
Introduction In this project, I am going to evaluate College data set using Regularization methods and make a prediction on new data. Data: College data set contains statistics for a large number of US Colleges from the 1995 issue of US News and World Report. The data frame with 777 observations on the following 18 variables: Private: A factor w...
5040 sym R (13210 sym/95 pcs) 18 img 1 tbl
Subset Selection Methods 1
Introduction Subset selection is very important in Data Science and Analytics. Subset selection involves identifying s subset of th ep predictors that we believe to be related to the response variable. Common subset selection methods are: 1. Forward stepwise selection 2. Backward stepwise selection Another method is Shrinkage. This approaches in...
1903 sym R (6825 sym/40 pcs) 3 img
Model Comparison 1
Introduction In this section, I compared the common machine learning models. I answered the Q10 at chapter 4 from the An Introduction to Statistical Learning Book. Data: This question should be answered using the Weekly data set, which is part of the ISLR package. Weekly data has 1089 observations on the following 9 variables. Year: The year tha...
6877 sym R (9862 sym/80 pcs) 3 img
Multinomial Logistic Regression
Introduction Data: This data set is called Heart Failure Prediction and available on Kaggle.com. It has 2126 observation of 22 variables. 2126 fetal cardiotocograms (CTGs) were automatically processed and the respective diagnostic features measured. The CTGs were also classified by three expert obstetricians and a consensus classification label ...
3064 sym R (13907 sym/65 pcs) 3 img
Advanced Clustering Methods
Introduction Cluster analysis is a statistical method for processing data. It works by organizing items into groups, or clusters, on the basis of how closely associated they are. Cluster analysis is a method of unsupervised learning and is a powerful data-mining tool of grouping similar objects. Data: The data set contains 3 classes of 50 instan...
4171 sym R (6264 sym/47 pcs) 10 img
Cluster Analysis Part II
Introduction This is Part II of the hierarchical clustering analysis. In this part, I will be adding noise to the data set and observe if it makes any difference. Note that,you can find the other parts from the links below: Part I covers question 1 Part II covers question 2 Part III covers question 3. Part IV covers question4 Data: The data fi...
2644 sym R (5093 sym/31 pcs) 10 img