Publications by Mustafa Arslan

Probability Simulations 2

10.10.2021

Bernoulli and Binomial Distributions The sample space of Bernoulli experiment has only two possible outcomes: Failure or Success. pbinom(y=r,n,p) returns the probability P(Y<=r) where Y ~ Binom(n,p), where r= trial, n= size and p= probability Question 1. Fourty percent of workers in Newtown support tax reform. Take a random sample of twelve suppo...

6191 sym R (4352 sym/97 pcs)

Probability Simulations 1

10.10.2021

Question 1: In an office, there are 10 sales person,four women and six men. Three women and two men will be chosen at random. Use simulations in R to approximate the probability. nloop=100000 sample1 = c(rep("W",4),rep("M",6)) sample1 ## [1] "W" "W" "W" "W" "M" "M" "M" "M" "M" "M" count = 0 for (iloop in 1:nloop){ MW = sample(sample1,5, r...

5339 sym R (11080 sym/86 pcs)

Support Vector Machine 1

27.09.2021

Introduction In this project, I am going to evaluate College data set using Regularization methods and make a prediction on new data. Support vector machines are powerful machine learning techniques and are used for supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis Th...

4265 sym R (13340 sym/83 pcs) 1 tbl

CSU R Course Notes 1

26.09.2021

1 Setup library(tibble) # special type of data frame library(magrittr) # pipes library(dplyr) # data manipulation library(ggplot2) # pretty plots library(tidyr) # reshape data frames; mostly for ggplots 2 Introduction This is a very simple streamlined set of exmple code to describe some basic summ...

9028 sym R (6613 sym/35 pcs) 4 img

Regularization Methods 1

22.09.2021

Introduction In this project, I am going to evaluate College data set using Regularization methods and make a prediction on new data. Data: College data set contains statistics for a large number of US Colleges from the 1995 issue of US News and World Report. The data frame with 777 observations on the following 18 variables: Private: A factor w...

5040 sym R (13210 sym/95 pcs) 18 img 1 tbl

Subset Selection Methods 1

19.09.2021

Introduction Subset selection is very important in Data Science and Analytics. Subset selection involves identifying s subset of th ep predictors that we believe to be related to the response variable. Common subset selection methods are: 1. Forward stepwise selection 2. Backward stepwise selection Another method is Shrinkage. This approaches in...

1903 sym R (6825 sym/40 pcs) 3 img

Model Comparison 1

18.09.2021

Introduction In this section, I compared the common machine learning models. I answered the Q10 at chapter 4 from the An Introduction to Statistical Learning Book. Data: This question should be answered using the Weekly data set, which is part of the ISLR package. Weekly data has 1089 observations on the following 9 variables. Year: The year tha...

6877 sym R (9862 sym/80 pcs) 3 img

Multinomial Logistic Regression

15.09.2021

Introduction Data: This data set is called Heart Failure Prediction and available on Kaggle.com. It has 2126 observation of 22 variables. 2126 fetal cardiotocograms (CTGs) were automatically processed and the respective diagnostic features measured. The CTGs were also classified by three expert obstetricians and a consensus classification label ...

3064 sym R (13907 sym/65 pcs) 3 img

Advanced Clustering Methods

04.09.2021

Introduction Cluster analysis is a statistical method for processing data. It works by organizing items into groups, or clusters, on the basis of how closely associated they are. Cluster analysis is a method of unsupervised learning and is a powerful data-mining tool of grouping similar objects. Data: The data set contains 3 classes of 50 instan...

4171 sym R (6264 sym/47 pcs) 10 img

Cluster Analysis Part II

03.09.2021

Introduction This is Part II of the hierarchical clustering analysis. In this part, I will be adding noise to the data set and observe if it makes any difference. Note that,you can find the other parts from the links below: Part I covers question 1 Part II covers question 2 Part III covers question 3. Part IV covers question4 Data: The data fi...

2644 sym R (5093 sym/31 pcs) 10 img