Publications by nsaunders

Beware of rogue header files (Bioconductor installation)

11.05.2010

Just a short note concerning a “gotcha”. As I have many times before, I opened an R console on my newly-upgraded (to lucid 10.04) Ubuntu machine, typed source(“http://bioconductor.org/biocLite.R”) and began a Bioconductor install with biocLite(). Only this time, I saw this: Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load...

1273 sym R (403 sym/2 pcs) 16 img

biomaRt and GenomeGraphs: a worked example

06.06.2010

As promised a few posts ago, another demonstration of the excellent biomaRt package, this time in conjunction with GenomeGraphs. Here’s what we’re going to do: Grab some public microarray data Normalise and get a list of the most differentially-expressed probesets Use biomaRt to fetch the genes associated with those probesets Plot the data u...

8681 sym R (4620 sym/8 pcs) 18 img

Analysing the ISMB 2010 meeting using R

20.07.2010

The colossus of bioinformatics meetings, ISMB, convened in Boston this year from July 9 – 13. As in recent years, the meeting was covered online at its website, FriendFeed and Twitter. I thought it would be fun to run a quick analysis of activity at the FriendFeed room using R. 1. Fetch the data We can use the FriendFeed API to fetch data in...

3620 sym R (4952 sym/8 pcs) 18 img

A brief introduction to “apply” in R

19.08.2010

At any R Q&A site, you’ll frequently see an exchange like this one: Q: How can I use a loop to […insert task here…] ? A: Don’t. Use one of the apply functions. So, what are these wondrous apply functions and how do they work? I think the best way to figure out anything in R is to learn by experimentation, using embarrassingly trivia...

7054 sym R (5606 sym/11 pcs) 16 img

Abstract word clouds using R

23.08.2010

A recent question over at BioStar asked whether abstracts returned from a PubMed search could easily be visualised as “word clouds”, using Wordle. This got me thinking about ways to solve the problem using R. Here’s my first attempt, which demonstrates some functions from the RCurl and XML packages. update: corrected a couple of copy/paste...

3032 sym R (1155 sym/6 pcs) 18 img

GEO database: curation lagging behind submission?

30.08.2010

GSE and GDS records in GEOmetadb by date I was reading an old post that describes GEOmetadb, a downloadable database containing metadata from the GEO database. We had a brief discussion in the comments about the growth in GSE records (user-submitted) versus GDS records (curated datasets) over time. Below, some quick and dirty R code to examine...

559 sym R (1099 sym/1 pcs) 18 img 1 tbl

Connecting to a MongoDB database from R using Java

24.09.2010

It would be nice if there were an R package, along the lines of RMySQL, for MongoDB. For now there is not – so, how best to get data from a MongoDB database into R? One option is to retrieve JSON via the MongoDB REST interface and parse it using the rjson package. Assuming, for example, that you have retrieved your CiteULike collection in JS...

3670 sym R (4179 sym/13 pcs) 16 img

BioStar users (of the world, unite)

09.10.2010

Egon writes: Can someone please plot the BioStar users on a Google Map? Sounds like a challenge. Let’s go. 1. Harvesting user IP addresses BioStar user profiles (here’s mine) include a location field. It’s free text and optional, which means that location is missing or inaccurate for many users. However, if you’re logged into BioSta...

4102 sym R (2325 sym/8 pcs) 20 img 2 tbl

Findings increasingly novel, scientists say…

29.10.2010

…was the tongue-in-cheek title of an image that I posted to Twitpic this week. It shows the usage of the word “novel” in PubMed article titles over time. As someone correctly pointed out at FriendFeed, it needs to be corrected for total publications per year. It was inspired by a couple of items that caught my attention. First, a questio...

2296 sym R (1013 sym/2 pcs) 18 img 1 tbl

Analysis of retractions in PubMed

30.11.2010

As so often happens these days, a brief post at FriendFeed got me thinking about data analysis. Entitled “So how many retractions are there every year, anyway?”, the post links to this article at Retraction Watch. It discusses ways to estimate the number of retractions and in particular, a recent article in the Journal of Medical Ethics (su...

4326 sym R (862 sym/5 pcs) 24 img 1 tbl