Publications by David Smith

Revolution Newsletter: September 2011

09.09.2011

The most recent edition of the Revolution Newsletter is out. The news section is below, and you read the full September edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Using Revolution R with Hadoop: Revolution Analytics has released three open-s...

3835 sym

Unlocking Big Data with R

09.09.2011

I have an article out this week on ReadWriteHack: Unlocking Big Data with R. My thanks to the folks at ReadWriteWeb for giving us the opportunity to showcase some of the many real-world Big Data applications of R. Here are some additional links about the applications mentioned in the article: New York Times: Destruction of the Haiti earthquake;�...

974 sym

Why you should care about reproducible research

12.09.2011

This week's Economist has an in-depth article on the consequences of failures reproducible research, adding more detail to the report in the New York Times in July. Errors in data analysis by researchers at Duke University led to patients in clinical trials being assigned the wrong drug: Dr Potti and his colleagues had mislabelled the cell lines...

2388 sym

Speed up recursion in R 600-fold with Rcpp

12.09.2011

Rcpp package co-author Dirk Eddelbuettel provides another case study in speeding up R code by rewriting repeatedly-called R code as inline C++ functions, using the classic Fibonacci recursion algorithm as an example. The speed gains here are impressive — over 600x compared to native recursive R code — but you could also improve performance b...

1918 sym

How to program MapReduce jobs in Hadoop with R

13.09.2011

MapReduce is a powerful programming framework for efficiently processing very large amounts of data stored in the Hadoop distributed filesystem. But while several programming frameworks for Hadoop exist, few are tuned to the needs of data analysts who typically work in the R environment as opposed to general-purpose languages like Java. That's w...

2806 sym

Revolution Analytics Fall Webinar Series

14.09.2011

We’ve lined up what we think is an amazing series of R-related webinars over the next couple of months. These free 30-60 minute webinars will cover a wide range of topics: big-data analysis in R with the RevoScaleR package, Hadoop and Netezza; introductions to R for SAS users and for R users new to Revolution R; and applications of R in Financ...

981 sym 1 tbl

Using Google Spreadsheets with R: an update

15.09.2011

Prompted by a rush of visitors from Andrew Gelman's blog, I went back and updated the details of my post from 2009 on reading data from Google Spreadsheets into R. Since then, Google had switched to using a secure (https) connection for Google Docs, which required some tweaks to the code. If you haven't seen it before, it's a neat way of using Go...

953 sym

How Lloyd’s of London uses R for Insurance

15.09.2011

Lloyd's is the world's leading specialist insurance market, and is often the first to insure new, unusual or complex risks. So it's no surprise that Lloyd's is one of the many companies that use R and its advanced capabilities for data analysis to help manage its insurance risks. At the useR! conference last month, Lloyd's analysts Markus Gesman...

1945 sym 10 img

How to extract time series from large timestamped logs with R

16.09.2011

Revolution Analytics' Joe Rickert has a new post on inside-R.org, demonstrating how you can use R and the RevoScaleR package to extract time series data from time-stamped logs (in this case, the “US Domestic Flights From 1990 to 2009” dataset on Infochimps):   Analyzing time series data of all sorts is a fundamental business analytics task...

1498 sym

R 2.14 to be released on October 31; R 2.13 patch on September 13

19.09.2011

The next major release of R has been announced: R 2.14.0 is scheduled for October 31. Details are still coming in about the new features planned for this release, but R core member Luke Tierney has revealed some of the performance improvements expected, and R core member Brian Ripley has spoken of forthcoming low-level support for multi-thread...

1592 sym