Publications by nsaunders
R/ggplot2 tip: aes_string
I’m a big fan of ggplot2. Recently, I ran into a situation which called for a useful feature that I had not used previously: aes_string. Imagine that you have data consisting of observations for several variables – let’s say A, B, C – where each observation is from one of two groups – call them X and Y: df1 <- data.frame(A = rnorm(50),...
1585 sym R (1087 sym/5 pcs) 6 img 1 tbl
A brief note: R 3.0.0 and bioinformatics
Today marks the release of R 3.0.0. There will be plenty of commentary and useful information at sites such as R-bloggers (for example, Tal’s post). Version 3.0.0 is great news for bioinformaticians, due to the introduction of long vectors. What does that mean? Well, several months ago, I was using the simpleaffy package from Bioconductor to no...
1469 sym R (244 sym/2 pcs) 4 img
Using the Ensembl Variant Effect Predictor with your 23andme data
I subscribe to the Ensembl blog and found, in my feed reader this morning, a post which linked to the Variant Effect Predictor (VEP). The original blog post, strangely, has disappeared. Not to worry: so, the VEP takes genotyping data in one of several formats, compares it with the Ensembl variation + core databases and returns a summary of how th...
3090 sym R (1431 sym/8 pcs) 6 img
-omics in 2013
Just how many (bad) -omics are there anyway? Let’s find out. 1. Get the raw data It would be nice if we could search PubMed for titles containing all -omics: *omics[TITL] However, we cannot since leading wildcards don’t work in PubMed search. So let’s just grab all articles from 2013: 2013[PDAT] and save them in a format which includes t...
1414 sym R (2951 sym/6 pcs) 6 img 1 tbl
Interestingly: the sentence adverbs of PubMed Central
Scientific writing – by which I mean journal articles – is a strange business, full of arcane rules and conventions with origins that no-one remembers but to which everyone adheres. I’ve always been amused by one particular convention: the sentence adverb. Used with a comma to make a point at the start of a sentence, as in these examples: ...
8980 sym R (4765 sym/15 pcs) 12 img 2 tbl
Microarrays, scan dates and Bioconductor: it shouldn’t be this difficult
When dealing with data from high-throughput experimental platforms such as microarrays, it’s important to account for potential batch effects. A simple example: if you process all your normal tissue samples this week and your cancerous tissue samples next week, you’re in big trouble. Differences between cancer and normal are now confounded wi...
2779 sym R (1988 sym/8 pcs) 4 img
Bacteria and Alzheimer’s disease: I just need to know if ten patients are enough
You can guarantee that when scientists publish a study titled: Determining the Presence of Periodontopathic Virulence Factors in Short-Term Postmortem Alzheimer’s Disease Brain Tissue a newspaper will publish a story titled: Poor dental health and gum disease may cause Alzheimer’s Without access to the paper, it’s difficult to assess th...
4481 sym R (1800 sym/9 pcs) 6 img
R: how not to use savehistory() and source()
Admitting to stupidity is part of the learning process. So in the interests of public education, here’s something stupid that I did today. You’re working in the R console. Happy with your exploratory code, you decide to save it to a file. savehistory(file = "myCode.R") Then, you type something else, for example: ls() # more lines here And t...
1499 sym R (104 sym/4 pcs) 4 img
Quilt plots. Like heat maps, only…heat maps
Stephen tweets: Quilt Plots: A Simple Tool for the #Visualisation of Large Epidemiological Data buff.ly/1doSx4X— Stephen Rudd (@SAGRudd) January 15, 2014 A “quilt plot” Quilt plots. Sounds interesting. The link points to a short article in PLoS ONE, containing a table and a figure. Here is Figure 1. If you looked at that and thought “Hey...
4906 sym 6 img
BLATting the internet: the most frequent gene?
I enjoyed this story from the OpenHelix blog today, describing a Microsoft Research project to mine DNA sequences from web pages and map them to UCSC genome builds. Laura DeMare asks: what was the most-hit gene? Most hit gene? APOE? MT @GenomeBrowser We BLATed the Internet! DNA sequences from 40 billion webpages mapped to hg19 goo.gl/7T2d5w— L...
1926 sym R (1129 sym/2 pcs) 4 img