Publications by Christopher Bare
Sage Bionetworks Synapse
Michael Kellen, Director of Technology at Sage Bionetworks, is trying to build a GitHub for science. It’s called Synapse and Kellen described it in a talk at the Sage Bionetworks Commons Congress 2012, this past weekend: ‘Synapse’ Pilot for Building an ‘Information Commons’. To paraphrase a Kellen’s intro: Science works better when pe...
2891 sym 2 img
Long-vector kludge in R
Just recently, I found out that R is limited to 32-bit integers, even on 64-bit hardware. Bummer, huh? As a consequence, the maximum size of a vector is 2^31-1. To be fair, dealing with numeric types across machine architectures is hard. A fixed representation has a lot of advantages. Java did the same thing, and now has a similar problem with 64...
2228 sym 4 img
Linear regression by gradient descent
In Andrew Ng’s Machine Learning class, the first section demonstrates gradient descent by using it on a familiar problem, that of fitting a linear function to data. Let’s start off, by generating some bogus data with known characteristics. Let’s make y just a noisy version of x. Let’s also add 3 to give the intercept term something to do....
1712 sym R (1423 sym/5 pcs) 6 img
OO in R
“Is there a package for obfuscating code in #rstats?”, someone asked. “The S4 object system?!” came the snarky reply. If you’re smiling right now, you know that it wouldn’t be funny if it weren’t at least a little bit true. Options: S3, S4 or R5? There can be little doubt that object oriented programming in R is the cause of some co...
4679 sym R (2473 sym/9 pcs) 2 img
Computing kook density in R
Do you ever see strange lights in the sky? Do you wonder what really goes on in Area 51? Would you like to use your R hacking skills to get to the bottom of the whole UFO conspiracy? Of course, you would! UFO data from infochimps is the focus of a data munging exercise in Chapter 1 of Machine Learning for Hackers by Drew Conway and John Myles Whi...
2008 sym R (141 sym/1 pcs) 4 img
Feature selection and linear modeling
Related To leave a comment for the author, please follow the link and comment on their blog: Digithead's Lab Notebook. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? clic...
406 sym
R in the Cloud
I’ve been having some great fun parallelizing R code on Amazon’s cloud. Now that things are chugging away nicely, it’s time to document my foibles so I can remember not to fall into the same pits of despair again. The goal was to perform lots of trails of a randomized statistical simulation. The jobs were independent and fairly chunky, taki...
7537 sym R (1959 sym/6 pcs) 4 img
Data analysis class
I’ve been writing software to help others do data analysis for a number of years and at the same time trying to work up my nerve to try my own analysis. Why let other people have all the fun? So, when I saw that Jeffrey Leek, biostatistician at Johns Hopkins and coauthor of Simply Statistics, was teaching an online course in data analysis, I si...
4416 sym 2 img
Playing with earthquake data
Related To leave a comment for the author, please follow the link and comment on their blog: Digithead's Lab Notebook. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? clic...
406 sym
Shiny talk by Joe Cheng
Shiny is a framework work for creating web applications with R. Joe Cheng of RStudio, Inc. presented on Shiny last evening in Zillow’s offices 30 stories up in the former WaMu Center. Luckily, the talk was interesting enough to compete with the view of Elliot bay aglow with late evening sunlight streaming through breaks in the clouds over the O...
1922 sym 2 img