Publications by CL

Block diagonal matrices in R

13.04.2011

As far as I can tell, R doesn’t have a function for building block diagonal matrices so as I needed one, I’ve coded it myself. It might save someone some time. Example: Let m1 and m2 two square matrices. Selec All Code:1 2 m1=matrix(runif(10*10),nrow=10,ncol=10) m2=matrix(runif(5*5),nrow=5,ncol=5) By passing any number of matrices as argument...

864 sym R (1143 sym/10 pcs) 4 img 5 tbl

Graph Bisection in R

14.04.2011

Recently I had to partition a set of SNPs into a training set and a test set. Making a random split would not do: both sets would likely contain very similar SNPs due to linkage disequilibrium (LD), making them non-independent. So what I really needed was a partition that minimises the LD between the two sets. The problem is equivalent to bisecti...

2055 sym R (3613 sym/3 pcs) 8 img 1 tbl

An unsurprising year

31.05.2011

I’ve received one those FW:…:FW emails yesterday with the following text: 2011 is an unusual year. Add the last two digits of your birth year to the age you will turn on your birthday this year and you’ll get 111! (…) This year, July has 5 Fridays, 5 Saturdays and 5 Sundays. This happens once every 623 years. This is called money bags. So...

8404 sym R (279 sym/3 pcs) 6 img

Two Castles Run 2011

12.06.2011

I did the Two Castles Run today; it’s a 10km race between Warwick and Kenilworth castles. The organizers were very quick to put the results online and even went the extra mile of offering them as a CSV file. It was therefore very tempting to launch R and see what the distribution looked like (and how I fared compared to the rest of the runners)....

6654 sym R (606 sym/4 pcs) 8 img 1 tbl

A first go at ‘manipulate’ in RStudio

26.08.2011

Something I’m missing from R (especially coming from Mathematica) is the ability to quickly build interactive graphs, which I find very useful for getting a good intuition of the impact of parameters on a mathematical function. Richie Cotton’s post about interactive plots in R gave me an incentive to have a go at the manipulate package in RSt...

8092 sym R (824 sym/2 pcs) 6 img

An exercise in plyr and ggplot2 using triathlon results

10.10.2011

I ran my last triathlon for this year a couple of weeks ago, in the beautiful town of Stratford-upon-Avon. The results were online the day after so I decided to have a look at my fellow competitors’ times, which gave me an opportunity to flex my plyr and ggplot2 muscles. The data itself was in pdf, so it was a bit of a pain to extract in usable...

5105 sym R (5705 sym/34 pcs) 16 img 17 tbl

plyr, ggplot2 and triathlon results, part II

13.10.2011

I ended my previous post by mentioning how one could imagine other ways of looking at the triathlon data with plyr and ggplot2. I couldn’t help but carry on playing with it so here are more stats and graphs from the same dataset: the results of a local sprint triathlon. This post will have a slightly more statistical bent to it. First we load ...

5581 sym R (7673 sym/28 pcs) 20 img 14 tbl

Anarchy Golf! And that’s your Sunday gone.

29.10.2011

I like to follow good practice when I program. I want my code to be readable, properly indented, modular and re-usable. And I want my variables to have descriptive names. There’s nothing that I hate moderately dislike more than arbitrary abbreviations and inconsistent style. I have to say that R is not the best example when it comes to style. E...

4043 sym R (469 sym/6 pcs) 4 img 3 tbl

The mysterious case of the misbehaving writeLines() (or: a cat saves the day)

10.11.2011

Dear readers and R experts, I submit to you a mysterious R quirk which has been baffling me for the best part of a week. I found a work-around but I’d love it if someone could explain this strangest of behaviour. I was using writeLines() to dump the results of a large number of classification results and noticed that the output file was not as ...

3500 sym R (2747 sym/4 pcs) 4 img 2 tbl

Winning from losing

19.01.2012

By following twitter’s #rstats hashtag (rss feed), I recently came across a very interesting R-related blog: datanalytics.com. The first post I read from it was about setting up an on-line reading group to go through the excellent “The Elements of Statistical Learning“. It is going on and you can find related posts here. Something you shoul...

2186 sym R (160 sym/2 pcs) 6 img 1 tbl