Publications by mrtnj

A slightly different introduction to R, part I

19.01.2013

Note in Swedish: Jag hoppas läsaren ursäktar att jag skriver på engelska då och då. This will be a brief introduction to using the statistics software R for biologists who want to do some of their data analysis in R. There are plenty of introductions to R (see here and here, for example; these are just a couple of intros that make some good ...

9283 sym R (918 sym/5 pcs) 18 img

A slightly different introduction to R, part II

27.01.2013

In part I, we looked at importing data into R and simple ways to manipulate data frames. Once we’ve gotten our data safely into R, the first thing we want to do is probably to make some plots. Below, we’ll make some simple plots of the made-up comb gnome data. If you want to play along, load the same file we used for part I. data <- read.csv(...

10518 sym R (2129 sym/27 pcs) 30 img

Using R: writing a table with odd lines (GFF track headers)

28.01.2013

The other day, I wanted to add track lines to a GFF file, so that I could view different features as separate custom tracks in a genome browser. The need to shuffle genome coordinates between different file formats seems to occur all the time when you deal with some kind of bioinformatic data. It’s usually just text files; one just has to keep ...

1238 sym R (1302 sym/2 pcs) 14 img

Using R: writing a table with odd lines (again)

31.01.2013

Let’s look at my gff track headers again. Why not do it with plyr instead? d_ply splits the data frame by the feature column and applies a nameless function that writes subsets to the file (and returns nothing, hence the ”_” in the name). This isn’t shorter or necessarily better, but it appeals to me. library(plyr) connection <- file("sep...

783 sym R (282 sym/1 pcs) 14 img

A slightly different introduction to R, part III

02.02.2013

I think you’ve noticed by now that a normal interactive R session is quite messy. If you don’t believe me, try playing around for a while and then give the history() command, which will show you the commands you’ve typed. If you’re anything like me, a lot of them are malformed attempts that generated some kind of error message. Hence, eve...

7630 sym R (1040 sym/8 pcs) 18 img

Using R: accessing PANTHER classifications

10.02.2013

Importing, subsetting, merging and exporting various text files with annotation (in the wide sense, i.e. anything that might help when interpreting your experiment) is not computation and it’s not biology either, but it’s housekeeping that needs to be done. Everyone has a weapon of choice for general-purpose scripting and mine is R. Yes, this...

4110 sym R (1923 sym/6 pcs) 14 img

More Haskell: a bootstrap

16.02.2013

So my playing around with Haskell goes on. You can follow the progress of the little bootstrap exercise on github. Now it’s gotten to the point where it actually does a bootstrap interval for the mean of a sample. Consider the following R script: n <- 100 fake.data <- data.frame(group=rep(1, n), data=rpois(n, 10)) write.table(fake.data, quote=F...

1584 sym R (897 sym/3 pcs) 14 img

A slightly different introduction to R, part IV

21.02.2013

Now, after reading in data, making plots and organising commands with scripts and Sweave, we’re ready to do some numerical data analysis. If you’re following this introduction, you’ve probably been waiting for this moment, but I really think it’s a good idea to start with graphics and scripting before statistical calculations. We’ll use...

8092 sym R (3068 sym/19 pcs) 32 img

Using R: Correlation heatmap with ggplot2

21.03.2013

Just a short post to celebrate that I learned today how incredibly easy it is to make a heatmap of correlations with ggplot2 (and reshape2, of course). data(attitude) library(ggplot2) library(reshape2) qplot(x=Var1, y=Var2, data=melt(cor(attitude)), fill=value, geom="tile") So, what is going on in that short passage? cor makes a correlation matr...

1069 sym R (123 sym/1 pcs) 16 img

Using R: reading tables that need a little cleaning

24.03.2013

Sometimes one needs to read tables that are a bit messy, so that read.table doesn’t immediately recognize the content as numerical. Maybe some weird characters are sprinkled in the table (ever been given a table with significance stars in otherwise numerical columns?). Some search and replace is needed. You can do this by hand, and I know this ...

2280 sym R (475 sym/2 pcs) 14 img