Publications by Rasmus Bååth
Bayesian Modeling of Anscombe’s Quartet
Anscombe’s quartet is a collection of four datasets that look radically different yet result in the same regression line when using ordinary least square regression. The graph below shows Anscombe’s quartet with imposed regression lines (taken from the Wikipedia article). While least square regression is a good choice for dataset 1 (upper l...
1903 sym R (131 sym/1 pcs) 2 img
Three Ways to Run Bayesian Models in R
There are different ways of specifying and running Bayesian models from within R. Here I will compare three different methods, two that relies on an external program and one that only relies on R. I won’t go into much detail about the differences in syntax, the idea is more to give a gist about how the different modeling languages look and feel...
646 sym R (61 sym/1 pcs)
useR 2013 was a blast!
I had a great time at useR 2013 in Albacete, Spain. The food was great, the people were fun and the weather was hot. A pleasant surprise was that I won the useR data analysis contest with my submission “Modeling Match Results in La Liga Using a Hierarchical Bayesian Poisson Model.” It was a fun exercise modeling football scores and I might wr...
1999 sym 2 img
Modeling Match Results in La Liga Using a Hierarchical Bayesian Poisson Model: Part one.
This is a slightly modified version of my submission to the UseR 2013 Data Analysis Contest which I had the fortune of winning 🙂 The purpose of the contest was to do something interesting with a dataset consisting of the match results from the last five seasons of La Liga, the premium Spanish football (aka soccer) league. In total there were 1...
2697 sym R (1009 sym/1 pcs)
Modeling Match Results in La Liga Using a Hierarchical Bayesian Poisson Model: Part two.
In the last blog post I showed my initial attempt at modeling football results in La Liga using a Bayesian Poission model, but there was one glaring problem with the model; it did not consider the advantage of being the home team. In this post I will show how to fix this! I will also show a way to deal with the fact that the dataset covers many L...
720 sym R (661 sym/1 pcs)
Modeling Match Results in La Liga Using a Hierarchical Bayesian Poisson Model: Part three.
In part one and part two of Modeling Match Results in La Liga Using a Hierarchical Bayesian Poisson Model I developed a model for the number of goals in football matches from five seasons of La Liga, the premier Spanish football league. I’m now reasonably happy with the model and want to use it to rank the teams in La Liga and to predict the o...
1197 sym R (441 sym/1 pcs)
The Bayesian Counterpart of Pearson’s Correlation Test
Except for maybe the t test, a contender for the title “most used and abused statistical test” is Pearson’s correlation test. Whenever someone wants to check if two variables relate somehow it is a safe bet (at least in psychology) that the first thing to be tested is the strength of a Pearson’s correlation. Only if that doesn’t work a ...
3684 sym R (420 sym/1 pcs) 2 img
Bayesian Estimation of Correlation – Now Robust!
So in the last post I showed how to run the Bayesian counterpart of Pearson’s correlation test by estimating the parameters of a bivariate normal distribution. A problem with assuming normality is that the normal distribution isn’t robust against outliers. Let’s see what happens if we take the data from the last post with the finishing time...
513 sym R (2 sym/1 pcs)
SPSS looked great! 20 years ago…
For some reason someone dropped a pamphlet advertising SPSS for Windows 3.0 in my mail box at work. This means that the pamphlet, and the advertised version of SPSS, should be at least 20 years old! These days I’m happily using R for everything but if I was going to estimate any models 20 years ago SPSS actually looked quite OK. In the early 90...
2226 sym 16 img
Going to Plot Some Proportions? Why not Flog ’em First?
Fractions and proportions can be difficult to plot nicely for a number of reasons: If the proportions are based on small counts (e.g., two of his three computing devices were Apple products) then the calculated proportions will only take on a number of discrete values. Depending on what you have measured there might be many proportions close to ...
1689 sym R (8 sym/1 pcs) 2 img