Publications by Wingfeet
Predicting Titanic deaths on Kaggle IV: random forest revisited
On July 19th I used randomForest to predict the deaths on Titanic in the Kaggle competition. Subsequently I found that both bagging and boosting gave better predictions than randomForest. This I found somewhat unsatisfactory, hence I am now revisiting randomForest. To my disappointment this does not result in predictions as good as b...
7445 sym 14 img
Predicting Titanic deaths on Kaggle V: Ranger
In two previous posts (Predicting Titanic deaths on Kaggle IV: random forest revisited, Predicting Titanic deaths on Kaggle) I was unable to make random forest predict as well as boosting. Hence when I read about an alternative implementation; ranger I took the opportunity to check if with ranger I could improve predictions. The clai...
7241 sym 6 img
Predicting Titanic deaths on Kaggle V: Ranger
In two previous posts (Predicting Titanic deaths on Kaggle IV: random forest revisited, Predicting Titanic deaths on Kaggle) I was unable to make random forest predict as well as boosting. Hence when I read about an alternative implementation; ranger I took the opportunity to check if with ranger I could improve predictions. The clai...
7239 sym 6 img
Predicting Titanic deaths on Kaggle VI: Stan
It is a bit a contradiction. Kaggle provides competitions on data science, while Stan is clearly part of the (Bayesian) statistics. Yet after using random forests, boosting and bagging, I also think this problem has a suitable size for Stan, which I understand can handle larger problems than older Bayesian software such as JAGS.What I...
18447 sym
Predicting Titanic deaths on Kaggle VI: Stan
It is a bit a contradiction. Kaggle provides competitions on data science, while Stan is clearly part of the (Bayesian) statistics. Yet after using random forests, boosting and bagging, I also think this problem has a suitable size for Stan, which I understand can handle larger problems than older Bayesian software such as JAGS.What I...
18447 sym
Predicting Titanic deaths on Kaggle VII: More Stan
Two weeks ago I used STAN to create predictions after just throwing in all independent variables. This week I aim to refine the STAN model. For this it is convenient to use the loo package (Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models). See also the paper by Aki Vehtari, Andrew Gelman and Jonah Gabry.Since the package d...
6619 sym
Predicting Titanic deaths on Kaggle VII: More Stan
Two weeks ago I used STAN to create predictions after just throwing in all independent variables. This week I aim to refine the STAN model. For this it is convenient to use the loo package (Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models). See also the paper by Aki Vehtari, Andrew Gelman and Jonah Gabry.Since the package d...
6617 sym
Trying to optimize
I wanted to try some more machine learning. On Kaggle there is a competition How Much Did It Rain? II. This is quite a bigger data set than Titanic. To quote from Kaggle:Rainfall is highly variable across space and time, making it notoriously tricky to measure. Rain gauges can be an effective measurement tool for a specific locatio...
12910 sym 1 tbl
Trying to optimize
I wanted to try some more machine learning. On Kaggle there is a competition How Much Did It Rain? II. This is quite a bigger data set than Titanic. To quote from Kaggle:Rainfall is highly variable across space and time, making it notoriously tricky to measure. Rain gauges can be an effective measurement tool for a specific locatio...
12910 sym 1 tbl
Vacancies in Europe
I like playing around with data from Eurostat. At this time the tools to do so are just so easy. There are tools to pull the data directly from the data base in R (eurostat package). Process it a bit using dplyr and before you know it, ggplot makes a plot.DataMy starting point to examine data is the database page. From there I can bro...
4882 sym 4 img