Publications by Stephen Turner

Visualize coverage for targeted NGS (exome) experiments

20.03.2014

I’m calling variants from exome sequencing data and I need to evaluate the efficiency of the capture and the coverage along the target regions.This sounds like a great use case for bedtools, your swiss-army knife for genomic arithmetic and interval manipulation. I’m lucky enough to be able to walk down the hall and bug Aaron Quin...

2663 sym 2 img

qqman: an R package for creating Q-Q and manhattan plots from GWAS results

15.05.2014

Three years ago I wrote a blog post on how to create manhattan plots in R. After hundreds of comments pointing out bugs and other issues, I’ve finally cleaned up this code and turned it into an R package.The qqman R package is on CRAN: http://cran.r-project.org/web/packages/qqman/The source code is on GitHub: https://github.com/st...

4085 sym R (1578 sym/12 pcs) 16 img

Using Volcano Plots in R to Visualize Microarray and RNA-seq Results

28.05.2014

I’ve been asked a few times how to make a so-called volcano plot from gene expression results. A volcano plot typically plots some measure of effect on the x-axis (typically the fold change) and the statistical significance on the y-axis (typically the -log10 of the p-value). Genes that are highly dysregulated are farther to the lef...

1919 sym 2 img

Collaborative lesson development with GitHub

02.06.2014

If you’re doing any kind of scientific computing and not using version control, you’re doing it wrong. The git version control system and GitHub, a web-based service for hosting and collaborating on git-controlled projects, have both become wildly popular over the last few years. Late last year GitHub announced that the 10-million...

3959 sym

An Annotated Online Bioinformatics / Computational Biology Curriculum

13.06.2014

Two years ago David Searls published an article in PLoS Comp Bio describing a series of online courses in bioinformatics. Yesterday, the same author published an updated version, “A New Online Computational Biology Curriculum,” (PLoS Comput Biol 10(6): e1003662. doi: 10.1371/journal.pcbi.1003662).This updated curriculum has a supp...

2896 sym

Bedtools tutorial from 2013 CSHL course

24.06.2014

A couple of months ago I posted about how to visualize exome coverage with bedtools and R. But if you’re looking to get a basic handle on genome arithmetic, take a look at Aaron Quinlan’s bedtools tutorials from the 2013 CSHL course. The tutorial uses data from the Maurano et al exploration of DnaseI hypersensitivity sites in hun...

1447 sym 8 img

Introduction to R for Life Scientists: Course Materials

07.07.2014

Last week I taught a three-hour introduction to R workshop for life scientists at UVA’s Health Sciences Library.I broke the workshop into three sections:In the first half hour or so I presented slides giving an overview of R and why R is so awesome. During this session I emphasized reproducible research and gave a demonstration of u...

2102 sym 2 img

Do your "data janitor work" like a boss with dplyr

20.08.2014

Data “janitor-work”The New York Times recently ran a piece on wrangling and cleaning data:“For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”Whether you call it “janitor-work,” wrangling/munging, cleaning/cleansing/scrubbing, tidying, or something else, the article above is worth a read (even though it implicitly ...

6762 sym R (721 sym/4 pcs) 2 img

UVA / Charlottesville R Meetup

11.09.2014

TL;DR? We started an R Users group, awesome community, huge turnout at first meeting, lots of potential.—I’ve sat through many hours of meetings where faculty lament the fact that their trainees (and the faculty themselves!) are woefully ill-prepared for our brave new world of computing- and data-intensive science. We’ve started...

3610 sym 2 img

R package to convert statistical analysis objects to tidy data frames

16.09.2014

I talked a little bit about tidy data my recent post about dplyr, but you should really go check out Hadley’s paper on the subject.R expects inputs to data analysis procedures to be in a tidy format, but the model output objects that you get back aren’t always tidy. The reshape2, tidyr, and dplyr are meant to take data frames, munge them arou...

1954 sym R (872 sym/2 pcs)