Publications by Mael Illien
data622hw3
Setup library(skimr) library(tidyverse) library(caret) # For featureplot, classification report library(corrplot) # For correlation matrix library(AppliedPredictiveModeling) library(mice) # For data imputation library(VIM) # For missing data visualization library(gridExtra) # For grid plots library(rpart) # For Decision Trees models library(rpart...
24776 sym R (11765 sym/54 pcs) 25 img 6 tbl
data622finalproject
Setup library(skimr) library(tidyverse) library(gridExtra) library(readr) library(dplyr) library(caret) library(naivebayes) library(factoextra) # For PCA plots library(e1071) library(Rtsne) library(RColorBrewer) library(gbm) library(randomForest) Data Description mnist_raw <- read_csv("https://pjreddie.com/media/files/mnist_train.csv", col_names...
11285 sym R (17609 sym/41 pcs) 14 img
data622hw4
Setup library(skimr) library(tidyverse) library(caret) # For featureplot, classification report library(corrplot) # For correlation matrix and PCA contributionplots library(AppliedPredictiveModeling) library(mice) # For data imputation library(VIM) # For missing data visualization library(gridExtra) # For grid plots library(pROC) # For AUC calcul...
24795 sym R (17962 sym/50 pcs) 27 img 3 tbl
penguins-classification
Introduction Out of the many supervised learning classification methods, LDA (Linear Discriminant Analysis), QDA (Quadratic Discriminant Analysis), NB (Naive Bayes) and KNN (K-Nearest Neighbors) are studied in this project with the use of the classic Palmer Penguins dataset. LDA and its cousin QDA both assume that observations from each class are...
10649 sym R (11596 sym/28 pcs) 11 img 3 tbl
loan-approval-trees
Setup library(skimr) library(tidyverse) library(caret) # For featureplot, classification report library(corrplot) # For correlation matrix library(AppliedPredictiveModeling) library(mice) # For data imputation library(VIM) # For missing data visualization library(gridExtra) # For grid plots library(rpart) # For Decision Trees models library(rpart...
20646 sym R (8760 sym/45 pcs) 18 img 3 tbl
unsupervised-pca-clustering
Setup library(skimr) library(tidyverse) library(caret) # For featureplot, classification report library(corrplot) # For correlation matrix and PCA contributionplots library(AppliedPredictiveModeling) library(mice) # For data imputation library(VIM) # For missing data visualization library(gridExtra) # For grid plots library(dendextend) # For dend...
17813 sym R (8914 sym/28 pcs) 24 img 3 tbl
recognizing-digits
Setup library(skimr) library(tidyverse) library(gridExtra) library(readr) library(dplyr) library(caret) library(naivebayes) library(factoextra) # For PCA plots library(e1071) library(Rtsne) library(RColorBrewer) library(gbm) library(randomForest) Data Description mnist_raw <- read_csv("https://pjreddie.com/media/files/mnist_train.csv", col_names...
11283 sym R (17609 sym/41 pcs) 14 img
US Company Distribution
Introduction The aim is this work is to apply principles of data visualization using the tidyverse and ggplot2 packages while exploring the 5,000 fastest growing companies in the US in the year 2020, as compiled by Inc. magazine. One of key principles of data visualization is the data ink ratio. The concept behind data ink ratio is that non-data...
3046 sym R (3874 sym/11 pcs) 3 img 5 tbl
migration-inference
Introduction Given the extraordinary surge in migration during the recent years, it is interesting to find out if there is a particular change in the demographics of those groups. The data presented here does not account for refugees or population outflows, but it does provide the opportunity to explore some macro trends of migration, which in th...
7990 sym R (6777 sym/29 pcs) 6 img 7 tbl