Publications by nsaunders

Box plots. Like box plots, only…box plots.

02.02.2014

On a rare, brief holiday (here and here, if you’re interested; both highly-recommended), I make the mistake of checking my Twitter feed: paging @neilfws . . . RT @psudmant: Ground breaking new methods from @naturemethods – boxplots – no rly nature.com/nmeth/journal/…— Chris Miller (@chrisamiller) January 30, 2014 This points me to BoxP...

1235 sym 4 img

A minor update to my “apply functions” post

27.02.2014

One of my more popular posts is A brief introduction to “apply” in R. Come August, it will be four years old. Technology moves on, old blog posts do not. So: thanks to BioStar user zx8754 for pointing me to this Stack Overflow post, in which someone complains that the code in the post does not work as described. The by example is now fixed. S...

989 sym 4 img

This is why code written by scientists gets ugly

13.05.2014

There’s a lot of discussion around why code written by self-taught “scientist programmers” rarely follows what a trained computer scientist would consider “best practice”. Here’s a recent post on the topic. One answer: we begin with exploratory data analysis and never get around to cleaning it up. An example. For some reason, a resear...

1548 sym R (1878 sym/3 pcs) 4 img

Converting a spreadsheet of SMILES: my first OSM contribution

30.06.2014

I’ve long admired the work of the Open Source Malaria Project. Unfortunately time and “day job” constraints prevent me from being as involved as I’d like. So: I was happy to make a small contribution recently in response to this request for help: Can anyone help @O_S_M to convert this spreadsheet ( malaria.ourexperiment.org/biological_dat...

2073 sym R (1356 sym/5 pcs) 4 img

When life gives you coloured cells, make categories

05.08.2014

Let’s start by making one thing clear. Using coloured cells in Excel to encode different categories of data is wrong. Next time colleagues explain excitedly how “green equals normal and red = tumour”, you must explain that (1) they have sinned and (2) what they meant to do was add a column containing the words “normal” and “tumour”....

2042 sym R (1010 sym/6 pcs) 4 img

Venn figures go wrong

12.08.2014

6-way Venn bananaI thought nothing could top the classic “6-way Venn banana”, featured in The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. That is until I saw Figure 3 from Compact genome of the Antarctic midge is likely an adaptation to an extreme environment.5-way Venn roadkill What’s odd is that Figure 2 i...

1245 sym 8 img

Ebola, Wikipedia and data janitors

21.09.2014

Sometimes, several strands of thought come together in one place. For me right now, it’s the Wikipedia page “Ebola virus epidemic in West Africa”, which got me thinking about the perennial topic of “data wrangling”, how best to provide public data and why I can’t shake my irritation with the term “data science”. Not to mention Ebo...

2873 sym R (1584 sym/1 pcs) 8 img

PubMed Publication Date: what is it, exactly?

23.09.2014

File this one under “has troubled me (and others) for some years now, let’s try to resolve it.” Let’s use the excellent R/rentrez package to search PubMed for articles that were retracted in 2013. library(rentrez) es <- entrez_search("pubmed", ""Retracted Publication"[PTYP] 2013[PDAT]", usehistory = "y") es$count # [1] 117 117 articles....

2479 sym R (2677 sym/7 pcs) 4 img

Bioinformatics journals: time from submission to acceptance, revisited

13.10.2014

Before we start: yes, we’ve been here before. There was the Biostars question “Calculating Time From Submission To Publication / Degree Of Burden In Submitting A Paper.” That gave rise to Pierre’s excellent blog post and code + data on Figshare. So why are we here again? 1. It’s been a couple of years. 2. This is the R (+ Ruby) version....

3969 sym R (2667 sym/5 pcs) 6 img

Counting things is hard for a given value of “things”

01.12.2014

This post is just a summary of some interesting online discussion from last week around open access publishing. I learned a few things about definitions and PubMed/PMC filters. It all begins with an opinion piece, “Open access is tiring out peer reviewers.” With a title like that you might expect rebuttals from people like Michael Eisen and ...

2995 sym R (181 sym/2 pcs) 6 img