Publications by nsaunders

APIs: I wish the life sciences would learn from social networks

10.12.2009

I was prompted by a thread on the apparent decline of FriendFeed to look for evidence of declining participation in my networks. First, a quick and dirty Ruby script, tls.rb to grab the Life Scientists feed and count the likes and comments: #!/usr/bin/ruby require 'rubygems' require 'json/pure' require 'net/http' require 'open-uri' def format_...

2341 sym R (1399 sym/6 pcs) 14 img

The Life Scientists at FriendFeed: 2009 summary

23.12.2009

The Life Scientists 2009 It’s Christmas Eve tomorrow and so I declare the year over. My Christmas gift to you is a summary of activity in 2009 at the FriendFeed Life Scientists group. It’s crafted using R + Ruby, with raw data and some code snippets available. If you want to see the most popular items from the group this year, head down to...

7197 sym R (4803 sym/9 pcs) 18 img

Samples per series/dataset in the NCBI GEO database

07.01.2010

Andrew asks: I want to get an NCBI GEO report showing the number of samples per series or data set. Short of downloading all of GEO, anyone know how to do this? Is there a table of just metadata hidden somewhere? At work, we joke that GEO is the only database where data goes in, but it won’t come out. However, there is an alternative: the G...

1754 sym R (1032 sym/2 pcs) 12 img

A new twist on the identifier mapping problem

11.01.2010

Yesterday, Deepak wrote about BridgeDB, a software package to deal with the “identifier mapping problem”. Put simply, biologists can name a biological entity in any way that they like, leading to multiple names for the same object. Easily solved, you might think, by choosing one identifier and sticking to it, but that’s apparently way too...

1814 sym R (331 sym/3 pcs) 16 img

From the “blogosphere”? Hardly.

27.01.2010

I generally skip over “From the Blogosphere”, a (mostly) weekly-summary of one or two blog posts in Nature’s “Authors” section (here is the latest). Why? Well, I’ve always suspected that the title is rather misleading. Now, I have the hard numbers to prove it. My feed reader contains an archive of 128 articles, dating back to May 1...

1631 sym R (582 sym/1 pcs) 18 img

BioMart (and biomaRt)

26.03.2010

I’ve been vaguely aware of BioMart for a few years. Inexplicably, I’ve only recently started to use it. It’s one of the most useful applications I’ve ever used. The concept is simple. You have a set of identifiers that describe a biological object, such as a gene. These are called filters. They have values – for example, HGNC symb...

2086 sym R (2101 sym/2 pcs) 16 img

Plotting “time of day” data using ggplot2

14.04.2010

William asks: How can I make a graph that looks like this, “tweet density” style, showing time intervals? He then helpfully describes his input data: a CSV file with headers “time started, time finished, date”. Here’s a simple CSV file, tasks.csv: task,date,start,end task1,2010-03-05,09:00:00,13:00:00 task2,2010-03-06,10:00:00,15:00...

1374 sym R (1585 sym/5 pcs) 18 img

I’d be more than happy with the unlinked data web

14.04.2010

Visit this URL and you’ll find a perfectly-formatted CSV file containing information about recent earthquakes. A nice feature of R is the ability to slurp such a URL straight into a data frame: quakes <- read.csv("http://neic.usgs.gov/neis/gis/qed.asc", header = T) colnames(quakes) # [1] "Date" "TimeUTC" "Latitude" "Longitude" "Magnitu...

902 sym R (433 sym/1 pcs) 16 img

Getting your web application and R(Apache) to talk to each other

19.04.2010

Here’s the situation. Web applications, built using a framework (e.g. Rails, Django) are great for fetching data from a database and rendering it. They’re not so great for crunching and charting the data. Conversely, R is great for crunching and charting, but doesn’t make for a great web application. Index view for values The idea then, ...

5414 sym R (1475 sym/10 pcs) 20 img

Experiments with igraph

21.04.2010

Networks – social and biological – are all the rage, just now. Indeed, a recent entry at Duncan’s QOTD described the “hairball” network representation as the dominant cultural icon in molecular biology. I’ve not had occasion to explore networks “professionally”, but have always been fascinated by both networks and the tools used ...

5106 sym R (2216 sym/7 pcs) 20 img