Publications by Luis

On R versus SAS

06.10.2011

A short while ago there was a discussion on linkedin about the use of SAS versus R for the enterprise. I have thought a bit about the issue but, as I do not use Linkedin, I did not make any comments there. Disclaimer: I did use SAS a lot between 1992 and 1997, mostly for genetic evaluation, heavily relying on BASE, STAT, IML and GRAPH. From that...

5167 sym

All combinations for levelplot

07.10.2011

In a previous post I explained how to create all possible combinations of the levels of two factors using expand.grid(). Another use for this function is to create a regular grid for two variables to create a levelplot or a contour plot. For example, let’s say that we have fitted a multiple linear regression to predict wood stiffness (stiff, th...

1340 sym R (243 sym/1 pcs) 2 img

A brief idea of style

08.10.2011

Once one starts writing more R code the need for consistency increases, as it facilitates managing larger projects and their maintenance. There are several style guides or suggestions for R; for example, Andrew Gelman’s, Hadley Wickham’s, Bioconductor’s and this one. I tend to write closer to Google’s R style guide, which contains some he...

1644 sym

Operating on datasets inside a function

09.10.2011

There are times when we need to write a function that makes changes to a generic data frame that is passed as an argument. Let’s say, for example, that we want to write a function that converts to factor any variable with names starting with a capital letter. There are a few issues involved in this problem, including: Obtaining a text version ...

1475 sym R (1111 sym/1 pcs)

Reading HTML pages in R for text processing

10.10.2011

We were talking with one of my colleagues about doing some text analysis—that, by the way, I have never done before—for which the first issue is to get text in R. Not any text, but files that can be accessed through internet. In summary, we need to access an HTML file, parse it so we can access specific content and then remove the HTML tags. ...

1226 sym R (544 sym/1 pcs)

Upgrading R (and packages)

10.10.2011

I tend not to upgrade R very often—running from 6 months to 1 year behind in version numbers—because I had to reinstall all packages: a real pain. A quick search shows that people have managed to come up with good solutions to this problem, as presented in this stackoverflow thread. I used the code in my mac: # Run in the old installation (pr...

1397 sym R (1091 sym/2 pcs)

Setting plots side by side

11.10.2011

This is simple example code to display side-by-side lattice plots or ggplot2 plots, using the mtcars dataset that comes with any R installation. We will display a scatterplot of miles per US gallon (mpg) on car weight (wt) next to another scatterplot of the same data, but using different colors by number of engine cylinders (cyl, treated as facto...

1471 sym R (1098 sym/2 pcs) 4 img

Simulating data following a given covariance structure

12.10.2011

Every year there is at least a couple of occasions when I have to simulate multivariate data that follow a given covariance matrix. For example, let’s say that we want to create an example of the effect of collinearity when fitting multiple linear regressions, so we want to create one variable (the response) that is correlated with a number of ...

2599 sym R (3305 sym/2 pcs)

Maximum likelihood

13.10.2011

This post is one of those ‘explain to myself how things work’ documents, which are not necessarily completely correct but are close enough to facilitate understanding. Background Let’s assume that we are working with a fairly simple linear model, where we only have a response variable (say tree stem diameter in cm). If we want to ‘guess�...

3304 sym R (1870 sym/3 pcs) 36 img

Linear mixed models in R

16.10.2011

A substantial part of my job has little to do with statistics; nevertheless, a large proportion of the statistical side of things relates to applications of linear mixed models. The bulk of my use of mixed models relates to the analysis of experiments that have a genetic structure. A brief history of time At the beginning (1992-1995) I would use...

5962 sym R (4948 sym/4 pcs)