Publications by Chris
Results of the St. Pat’s 10 Miler and 5K
Recently I ran the St. Pat’s 10 Miler in Atlantic City, Nj. It was my first official running event ever and I enjoyed it lot. Shortly after the race the official results have been posted on the Internet. The data did not only include the number and times of the participants but also gender and age. Looking at the finisher time distribution it s...
873 sym 4 img
How to plot a graph in R
Here’s a quick tutorial on how to get a nice looking graph out of R (aka the R Project for Statistical Computing). Don’t forget that help for any R command can be displayed by typing the question mark followed by the command. For example, to see help on plot, type ?plot.Let’s start with some data from your friends, the Federal Reserve. The ...
3314 sym R (1709 sym/12 pcs) 8 img 1 tbl
R String processing
Here’s a little vignette of data munging using the regular expression facilities of R (aka the R-project for statistical computing). Let’s say I have a vector of strings that looks like this:> coords [1] "chromosome+:157470-158370" "chromosome+:158370-158450" "chromosome+:158450-158510" [4] "chromosome+:158510-159330" "chromosome-:157460-1585...
1556 sym Python (1233 sym/3 pcs) 4 img
Parsing GEO SOFT files with Python and Sqlite
NCBI’s GEO database of gene expression data is a great resource, but its records are very open ended. This lack of rigidity was perhaps necessary to accommodate the variety of measurement technologies, but makes getting data out a little tricky. But, all that flexibility is a curse from the point of view of extracting data. The scripts I end up...
4015 sym Python (5879 sym/4 pcs) 2 img
Select operations on R data frames
The R language is weird – particularly for those coming from a typical programmer’s background, which likely includes OO languages in the curly-brace family and relational databases using SQL. A key data structure in R, the data.frame, is used something like a table in a relational database. In terms of R’s somewhat byzantine type system (w...
2761 sym R (1899 sym/8 pcs) 4 img
Using R and Bioconductor for sequence analysis
Here’s another quick R vignette, in case I pick this up later and need to remind myself where I got stuck. I was trying to use R for a bit of basic sequence analysis, with mixed results.First, install the BSgenome package, which is part of Bioconductor. Get GeneR while you’re at it.> source("http://bioconductor.org/biocLite.R") > biocLite("BS...
2162 sym R (271 sym/2 pcs) 4 img
Joining data frames in R
Want to join two R data frames on a common key? Here’s one way do a SQL database style join operation in R.We start with a data frame describing probes on a microarray. The key is the probe_id and the rest of the information describes the location on the genome targeted by that probe.> head(probes) probe_id sequence strand start ...
1502 sym R (1982 sym/4 pcs) 4 img 1 tbl
SQL group by in R
The R statistical computing environment is awesome, but weird. How to do database operations in R is a common source of questions. The other day I was looking for an equivalent to SQL group by for R data frames. You need this to compute summary statistics on subsets of a data set divided up by some categorical variable. It turns out there are sev...
2313 sym 4 img
Pivot tables in R
A common data-munging operation is to compute cross tabulations of measurements by categories. SQL Server and Excel have a nice feature called pivot tables for this purpose. Here we’ll figure out how to do pivot operations in R.Let’s imagine an experiment where we’re measuring the gene activity of an organism under different conditions — ...
2442 sym R (3614 sym/7 pcs) 4 img
The R type system
R is a weird beast. Through it’s ancestor the S language, it claims a proud heritage reaching back to Bell Labs in the 1970’s when S was created as an interactive wrapper around a set of statistical and numerical subroutines. As a programming language, R takes ideas from Unix shell scripting, functional languages (Lisp and ML), and also a lit...
4704 sym R (1510 sym/3 pcs) 4 img