Publications by Wingfeet
SAS PROC MCMC example 12 in R: Change point model
I restarted at working my way through the PROC MCMC examples. The SAS manual describes this example: Consider the data set from Bacon and Watts (1971), where is the logarithm of the height of the stagnant surface layer and the covariate is the logarithm of the flow rate of water. It is a simple example. It provided no problems...
10505 sym 6 img
Deaths in the Netherlands by cause and age
I downloaded counts of deaths by age, year and mayor cause from the Dutch statistics site. In this post I do some plots to look at causes and changes between the years.Data Data from CBS. I downloaded the data in Dutch, hence the first thing to do was provide some kind of translation. The coding used seems slightly different from IDC...
6980 sym 8 img
More on causes of death in Netherlands over the years
Last week I had a post ‘Deaths in the Netherlands by cause and age‘. During creation of that post I made one plot which I had not shown. It shows something odd. There is a vertical striping. Hence mortality varies by year across age.To examine this phenomenon further here is a plot of some underlying causes. I would say the stripi...
5147 sym 6 img
Predicting Titanic deaths on Kaggle
Kaggle has a competition to predict who will die on the famous Titanic ‘Machine Learning from Disaster”. It is placed as knowledge competition. Just up there to learn. I am late to the party, it has been been for 1 1/2 year, to end by end 2015. It is a small data set, hence interesting to learn from. It is also a competition with a number of ...
5965 sym 2 img
Predicting Titanic deaths on Kaggle
Kaggle has a competition to predict who will die on the famous Titanic ‘Machine Learning from Disaster”. It is placed as knowledge competition. Just up there to learn. I am late to the party, it has been been for 1 1/2 year, to end by end 2015. It is a small data set, hence interesting to learn from. It is also a competition with a number of ...
5965 sym 2 img
Predicting Titanic deaths on Kaggle II: gbm
Following my previous post I have decided to try and use a different method: generalized boosted regression models (gbm). I have read the background in Elements of Statistical Learning and arthur charpentier’s nice post on it. This data is a nice occasion to get my hands dirty.Data Data as before. However, I have added some more...
7308 sym 6 img
Predicting Titanic deaths on Kaggle II: gbm
Following my previous post I have decided to try and use a different method: generalized boosted regression models (gbm). I have read the background in Elements of Statistical Learning and arthur charpentier’s nice post on it. This data is a nice occasion to get my hands dirty.Data Data as before. However, I have added some more...
7308 sym 6 img
Predicting Titanic deaths on Kaggle III: Bagging
This is the third post on prediction the deaths. The first one used randomforest, the second boosting (gbm). The aim of the third post was to use bagging. In contrast to the former posts I abandoned dplyr in this post. It gave some now you see now you don’t errors.DataThe data is supposed to be the same as previous.library(ipred)lib...
3644 sym 4 img
Predicting Titanic deaths on Kaggle III: Bagging
This is the third post on prediction the deaths. The first one used randomforest, the second boosting (gbm). The aim of the third post was to use bagging. In contrast to the former posts I abandoned dplyr in this post. It gave some now you see now you don’t errors.DataThe data is supposed to be the same as previous.library(ipred)lib...
3644 sym 4 img
Predicting Titanic deaths on Kaggle IV: random forest revisited
On July 19th I used randomForest to predict the deaths on Titanic in the Kaggle competition. Subsequently I found that both bagging and boosting gave better predictions than randomForest. This I found somewhat unsatisfactory, hence I am now revisiting randomForest. To my disappointment this does not result in predictions as good as b...
7445 sym 14 img