Publications by David Smith
Revolution Newsletter: September 2011
The most recent edition of the Revolution Newsletter is out. The news section is below, and you read the full September edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Using Revolution R with Hadoop: Revolution Analytics has released three open-s...
3835 sym
Unlocking Big Data with R
I have an article out this week on ReadWriteHack: Unlocking Big Data with R. My thanks to the folks at ReadWriteWeb for giving us the opportunity to showcase some of the many real-world Big Data applications of R. Here are some additional links about the applications mentioned in the article: New York Times: Destruction of the Haiti earthquake;�...
974 sym
Why you should care about reproducible research
This week's Economist has an in-depth article on the consequences of failures reproducible research, adding more detail to the report in the New York Times in July. Errors in data analysis by researchers at Duke University led to patients in clinical trials being assigned the wrong drug: Dr Potti and his colleagues had mislabelled the cell lines...
2388 sym
Speed up recursion in R 600-fold with Rcpp
Rcpp package co-author Dirk Eddelbuettel provides another case study in speeding up R code by rewriting repeatedly-called R code as inline C++ functions, using the classic Fibonacci recursion algorithm as an example. The speed gains here are impressive — over 600x compared to native recursive R code — but you could also improve performance b...
1918 sym
How to program MapReduce jobs in Hadoop with R
MapReduce is a powerful programming framework for efficiently processing very large amounts of data stored in the Hadoop distributed filesystem. But while several programming frameworks for Hadoop exist, few are tuned to the needs of data analysts who typically work in the R environment as opposed to general-purpose languages like Java. That's w...
2806 sym
Revolution Analytics Fall Webinar Series
We’ve lined up what we think is an amazing series of R-related webinars over the next couple of months. These free 30-60 minute webinars will cover a wide range of topics: big-data analysis in R with the RevoScaleR package, Hadoop and Netezza; introductions to R for SAS users and for R users new to Revolution R; and applications of R in Financ...
981 sym 1 tbl
Using Google Spreadsheets with R: an update
Prompted by a rush of visitors from Andrew Gelman's blog, I went back and updated the details of my post from 2009 on reading data from Google Spreadsheets into R. Since then, Google had switched to using a secure (https) connection for Google Docs, which required some tweaks to the code. If you haven't seen it before, it's a neat way of using Go...
953 sym
How Lloyd’s of London uses R for Insurance
Lloyd's is the world's leading specialist insurance market, and is often the first to insure new, unusual or complex risks. So it's no surprise that Lloyd's is one of the many companies that use R and its advanced capabilities for data analysis to help manage its insurance risks. At the useR! conference last month, Lloyd's analysts Markus Gesman...
1945 sym 10 img
How to extract time series from large timestamped logs with R
Revolution Analytics' Joe Rickert has a new post on inside-R.org, demonstrating how you can use R and the RevoScaleR package to extract time series data from time-stamped logs (in this case, the “US Domestic Flights From 1990 to 2009” dataset on Infochimps): Analyzing time series data of all sorts is a fundamental business analytics task...
1498 sym
R 2.14 to be released on October 31; R 2.13 patch on September 13
The next major release of R has been announced: R 2.14.0 is scheduled for October 31. Details are still coming in about the new features planned for this release, but R core member Luke Tierney has revealed some of the performance improvements expected, and R core member Brian Ripley has spoken of forthcoming low-level support for multi-thread...
1592 sym