Publications by nsaunders
Make prettier documents by reusing chunks in RMarkdown
No revelations here, just a little R tip for generating more readable documents. Original with lots of code at the topThere are times when I want to show code in a document, but I don’t want it to be the first thing that people see. What I want to see first is the output from that code. In this silly example, I want the reader to focus their at...
1340 sym R (1391 sym/3 pcs) 10 img
Just how many retracted articles are there in PubMed anyway?
I am forever returning to PubMed data, downloaded as XML, trying to extract information from it and becoming deeply confused in the process. Take the seemingly-simple question “how many retracted articles are there in PubMed?” Well, one way is to search for records with the publication type “Retracted Article”. As of right now, that retu...
2929 sym R (1512 sym/5 pcs) 4 img
PMRetract: PubMed retraction reporting rewritten as an interactive RMarkdown document
Back in 2010, I wrote a web application called PMRetract to monitor retraction notices in the PubMed database. It was written primarily as a way for me to explore some technologies: the Ruby web framework Sinatra, MongoDB (hosted at MongoHQ, now Compose) and Heroku, where the app was hosted. I automated the update process using Rake and the whole...
3809 sym 4 img
PubMed retraction reporting update
Just a quick update to the previous post. At the helpful suggestion of Steve Royle, I’ve added a new section to the report which attempts to normalise retractions by journal. So for example, J. Biol. Chem. has (as of now) 94 retracted articles and in total 170 842 publications indexed in PubMed. That becomes (100 000 / 170 842) * 94 = 55.022 re...
1133 sym 6 img
Configuring the R BatchJobs package for Torque batch queues
I was asked recently to look at some R code which performs “embarrassingly parallel” computations (the same function, multiple times, different parameters) and see whether I could modify it to run on one of our high-performance computing clusters. The machine has 63 virtual compute nodes and uses the TORQUE batch queue system to allocate node...
4912 sym R (3799 sym/11 pcs) 4 img
Project Tycho, ggplot2 and the shameless stealing of blog ideas
Last week, Mick Watson posted a terrific article on using R to recreate the visualizations in this WSJ article on the impact of vaccination. Someone beat me to the obvious joke. @BioMickWatson @pathogenomenick Nice quilt plot. — Ed Yong (@edyong209) April 9, 2015 Someone also beat me to the standard response whenever base R graphics are used...
4882 sym R (4939 sym/10 pcs) 10 img
R 3.1 -> 3.2 upgrade notes
My machines upgraded from R version 3.1.3 to version 3.2.0 last week, which means that existing code suddenly cannot find packages and so fails. Some notes to myself, possibly useful to others, for what to do when this happens. Relevant to Ubuntu-based systems (I use Linux Mint). 1. Update packages cp ~/R/x86_64-pc-linux-gnu-library/3.1 ~/R/x86_6...
1659 sym R (386 sym/6 pcs) 4 img
Some basics of biomaRt
One of the commonest bioinformatics questions, at Biostars and elsewhere, takes the form: “I have a list of identifiers (X); I want to relate them to a second set of identifiers (Y)”. HGNC gene symbols to Ensembl Gene IDs, for example. When this occurs I have been known to tweet “the answer is BioMart” (there are often other solutions too...
3424 sym R (5027 sym/8 pcs) 4 img
Searching for the Steamer retroelement in the ocean metagenome
Location of BLAST (tblastn) hits Mya arenaria GagPol (AIE48224.1) vs GOS contigsLast week, I was listening to episode 337 of the podcast This Week in Virology. It concerned a retrovirus-like sequence element named Steamer, which is associated with a transmissible leukaemia in soft shell clams. At one point the host and guests discussed the idea o...
1398 sym 6 img
Analysis of gene expression timecourse data using maSigPro
ANXA11 expression in human smooth muscle aortic cells post-ILb1 exposureAbout a year ago, I did a little work on a very interesting project which was trying to identify blood-based biomarkers for the early detection of stroke. The data included gene expression measurements using microarrays at various time points after the onset of ischemia (redu...
1671 sym 6 img