Publications by Mael Illien

data622hw3

02.04.2021

Setup library(skimr) library(tidyverse) library(caret) # For featureplot, classification report library(corrplot) # For correlation matrix library(AppliedPredictiveModeling) library(mice) # For data imputation library(VIM) # For missing data visualization library(gridExtra) # For grid plots library(rpart) # For Decision Trees models library(rpart...

24776 sym R (11765 sym/54 pcs) 25 img 6 tbl

data622finalproject

20.05.2021

Setup library(skimr) library(tidyverse) library(gridExtra) library(readr) library(dplyr) library(caret) library(naivebayes) library(factoextra) # For PCA plots library(e1071) library(Rtsne) library(RColorBrewer) library(gbm) library(randomForest) Data Description mnist_raw <- read_csv("https://pjreddie.com/media/files/mnist_train.csv", col_names...

11285 sym R (17609 sym/41 pcs) 14 img

data622hw4

02.05.2021

Setup library(skimr) library(tidyverse) library(caret) # For featureplot, classification report library(corrplot) # For correlation matrix and PCA contributionplots library(AppliedPredictiveModeling) library(mice) # For data imputation library(VIM) # For missing data visualization library(gridExtra) # For grid plots library(pROC) # For AUC calcul...

24795 sym R (17962 sym/50 pcs) 27 img 3 tbl

penguins-classification

06.09.2021

Introduction Out of the many supervised learning classification methods, LDA (Linear Discriminant Analysis), QDA (Quadratic Discriminant Analysis), NB (Naive Bayes) and KNN (K-Nearest Neighbors) are studied in this project with the use of the classic Palmer Penguins dataset. LDA and its cousin QDA both assume that observations from each class are...

10649 sym R (11596 sym/28 pcs) 11 img 3 tbl

loan-approval-trees

06.09.2021

Setup library(skimr) library(tidyverse) library(caret) # For featureplot, classification report library(corrplot) # For correlation matrix library(AppliedPredictiveModeling) library(mice) # For data imputation library(VIM) # For missing data visualization library(gridExtra) # For grid plots library(rpart) # For Decision Trees models library(rpart...

20646 sym R (8760 sym/45 pcs) 18 img 3 tbl

unsupervised-pca-clustering

06.09.2021

Setup library(skimr) library(tidyverse) library(caret) # For featureplot, classification report library(corrplot) # For correlation matrix and PCA contributionplots library(AppliedPredictiveModeling) library(mice) # For data imputation library(VIM) # For missing data visualization library(gridExtra) # For grid plots library(dendextend) # For dend...

17813 sym R (8914 sym/28 pcs) 24 img 3 tbl

recognizing-digits

05.09.2021

Setup library(skimr) library(tidyverse) library(gridExtra) library(readr) library(dplyr) library(caret) library(naivebayes) library(factoextra) # For PCA plots library(e1071) library(Rtsne) library(RColorBrewer) library(gbm) library(randomForest) Data Description mnist_raw <- read_csv("https://pjreddie.com/media/files/mnist_train.csv", col_names...

11283 sym R (17609 sym/41 pcs) 14 img

US Company Distribution

04.09.2021

Introduction The aim is this work is to apply principles of data visualization using the tidyverse and ggplot2 packages while exploring the 5,000 fastest growing companies in the US in the year 2020, as compiled by Inc. magazine. One of key principles of data visualization is the data ink ratio. The concept behind data ink ratio is that non-data...

3046 sym R (3874 sym/11 pcs) 3 img 5 tbl

migration-inference

05.09.2021

Introduction Given the extraordinary surge in migration during the recent years, it is interesting to find out if there is a particular change in the demographics of those groups. The data presented here does not account for refugees or population outflows, but it does provide the opportunity to explore some macro trends of migration, which in th...

7990 sym R (6777 sym/29 pcs) 6 img 7 tbl