Publications by nsaunders

Popular topics at the BioStar Q&A site

23.08.2011

Which topics are the most popular at the BioStar bioinformatics Q&A site? One source of data is the tags used for questions. Tags are somewhat arbitrary of course, but fortunately BioStar has quite an active community, so “bad” tags are usually edited to improve them. Hint: if your question is “How to find SNPs”, then tagging it with “...

1993 sym R (1192 sym/5 pcs) 20 img 2 tbl

Interacting with bioinformatics webservers using R

08.09.2011

In an ideal world, all bioinformatics tools would be made available via the Web as a web service with an API, as well as a standalone package to download for local use. This is rarely the case and sometimes, even where one or the other is available, factors such as cost come into play. So we resort to web scraping; writing code to interact with...

4437 sym R (5182 sym/10 pcs) 16 img

A Friday round-up

01.12.2011

Just a brief selection of items that caught my eye this week. Note that this is a Friday as opposed to Friday, lest you mistake this for a new, regular feature. 1. R/statistics ggbio A new Bioconductor package which builds on the excellent ggplot graphics library, for the visualization of biological data. R development master class Hadley Wickha...

1800 sym 4 img

Simple plots reveal interesting artifacts

14.03.2012

I’ve recently been working with methylation data; specifically, from the Illumina Infinium HumanMethylation450 bead chip. It’s a rather complex array which uses two types of probes to determine the methylation state of DNA at ~ 485 000 sites in the genome. The Bioconductor project has risen to the challenge with a (somewhat bewildering) varie...

2838 sym R (1780 sym/8 pcs) 6 img

R gotcha for the week

15.03.2012

I use the biomaRt package from Bioconductor in almost every R session. So I thought I’d load the library and set up a mart instance in my ~/.Rprofile: library(biomaRt) mart.hs <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl") On starting R, I was somewhat perplexed to see this error message: Error in bmVersion(mart, verbose = ...

1025 sym R (285 sym/3 pcs) 4 img

My day out at #osddmalaria

10.05.2012

Finally, I get around to telling you that… …on Friday 24th February, I took a day out from my regular job to attend a meeting on Open Source Drug Discovery for Malaria. I should state straight away that whilst drug discovery and chem(o)informatics are topics that I find very interesting, I have no professional experience or connections in eit...

4449 sym R (1498 sym/2 pcs) 8 img 1 tbl

Twitter coverage of the ISMB 2012 meeting: some statistics

15.08.2012

OK, let’s do this: some statistics and visualization of the tweets for ISMB 2012. First, thanks to Stephen Turner who got things started in this post at his excellent blog, Getting Genetics Done. Subscribe to his feed if you don’t already do so. I’ve created a Github repository for this project (and future Twitter-related work). If you’d...

5943 sym R (3006 sym/11 pcs) 14 img 5 tbl

Custom CSS for HTML generated using RStudio

26.08.2012

People have been telling me for a while that the latest version of RStudio, the IDE for R, is a great way to generate reports. I finally got around to trying it out and for once, the hype is justified. Start with this excellent tutorial from Jeremy Anglim. Briefly: the process is not so different to Sweave, except that (1) instead of embedding R ...

3970 sym R (476 sym/3 pcs) 8 img

Addendum to yesterday’s post on custom CSS and R Markdown

27.08.2012

Updates from RStudio support: (1) “Thanks for reporting and I was able to reproduce this as well. I’ve filed a bug and we’ll take a look.” (2) Taking a further look, this is actually a bug in the Markdown package and we’ve asked the maintainer (Jeffrey Horner) to look into it. As juejung points out in a comment on my previous post, app...

1269 sym R (181 sym/1 pcs) 4 img

Basic R: rows that contain the maximum value of a variable

12.02.2013

File under “I keep forgetting how to do this basic, frequently-required task, so I’m writing it down here.” Let’s create a data frame which contains five variables, vars, named A – E, each of which appears twice, along with some measurements: df.orig <- data.frame(vars = rep(LETTERS[1:5], 2), obs1 = c(1:10), obs2 = c(11:20)) df.orig # ...

1255 sym R (608 sym/2 pcs) 4 img