Publications by Luis
Surviving a binomial mixed model
A few years ago we had this really cool idea: we had to establish a trial to understand wood quality in context. Sort of following the saying “we don’t know who discovered water, but we are sure that it wasn’t a fish” (attributed to Marshall McLuhan). By now you are thinking WTF is this guy talking about? But the idea was simple; let’s ...
2022 sym R (1723 sym/3 pcs) 6 img
Do we need to deal with ‘big data’ in R?
David Smith at the Revolutions blog posted a nice presentation on “big data” (oh, how I dislike that term). It is a nice piece of work and the Revolution guys manage to process a large amount of records, starting with a download of 70GB and ending up with a series of linear regressions. I’ve spent the last two weeks traveling and finishing ...
1888 sym 2 img
If you are writing a book on Bayesian statistics
This post is somewhat marginal to R in that there are several statistical systems that could be used to tackle the problem. Bayesian statistics is one of those topics that I would like to understand better, much better, in fact. Unfortunately, I struggle to get the time to attend courses on the topic between running my own lectures, research and ...
3496 sym
On the (statistical) road, workshops and R
Things have been a bit quiet at Quantum Forest during the last ten days. Last Monday (Sunday for most readers) I flew to Australia to attend a couple of one-day workshops; one on spatial analysis (in Sydney) and another one on modern applications of linear mixed models (in Wollongong). This will be followed by attending The International Biometri...
3177 sym 2 img
R, academia and the democratization of statistics
I am not a statistician but I use statistics, teach some statistics and write about applications of statistics in biological problems. Last week I was in this biostatistics conference, talking with a Ph.D. student who was surprised about this situation because I didn’t have any statistical training. I corrected “any formal training”. On the...
2948 sym 2 img
Tall big data, wide big data
After attending two one-day workshops last week I spent most days paying attention to (well, at least listening to) presentations in this biostatistics conference. Most presenters were R users—although Genstat, Matlab and SAS fans were also present and not one time I heard “I can’t deal with the current size of my data sets”. ...
2850 sym
R pitfall #3: friggin’ factors
I received an email from one of my students expressing deep frustation with a seemingly simple problem. He had a factor containing names of potato lines and wanted to set some levels to NA. Using simple letters as example names he was baffled by the result of the following code: lines = factor(LETTERS) lines # [1] A B C D E F G H... # Levels: A B...
1425 sym R (656 sym/4 pcs)
First impressions of Doing Bayesian Data Analysis
About a month ago I was discussing the approach that I would like to see in introductory Bayesian statistics books. In that post I mentioned a PDF copy of Doing Bayesian Data Analysis by John K. Kruschke and that I have ordered the book. Well, recently a parcel was waiting in my office with a spanking new, real paper copy of the book. A few days ...
3444 sym 4 img
An R wish list for 2012
I expect there will be many reviews and wish lists for R this year, with many of them focusing on either running speed or dealing with large data sets. However, most issues that I would like to see tackled in R next year are not technical but, for lack of a better word, social. Many users will first encounter R through the r-project.org website. ...
3411 sym 2 img
Plotting earthquake data
Since 4th September 2010 we have had over 2,800 quakes (considering only magnitude 3+) in Christchurch. Quakes come in swarms, with one or few strong shocks, followed by numerous smaller ones and then the ocasional shock, creating an interesting data visualization problem. In our case, we have had swarms in September 2010, December 2010, February...
2059 sym R (811 sym/2 pcs) 4 img