Publications by David Smith
Faster R in Hadoop: rmr 1.3 now available
The RHadoop project continues the Big Data integration of R and Hadoop, with a new update to its rmr package. Version 1.3 of rmr improves the performance of map-reduce jobs for Hadoop written in R. New features include: An optional vectorized API for efficient R programming when dealing with small records. Fast C implementations for serializati...
1543 sym
Civic Data Challenge closes July 29
There's only a few days left to enter the Civic Data Challenge: entries are due before midnight EST on July 29 to qualify for the $100,000 in prizes. The competition, open to US residents, challenges particpants to applications and visualizations from civic health data. Prizes will be awarded by a panel of prestigious judges. Looks like a great o...
974 sym
Revolution Newsletter: July 2012
The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full July edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Quick Start Program for Hadoop: Revolution Analytics makes it easy for data...
2132 sym
Another R mention in the NYT
The R language gets a brief mention in an article in yesterday's New York Times on automated bond trading: The traders here are mostly educated in math or physics, often outside the United States, and their desks are piled high with textbooks like the “R Graphs Cookbook,” for working with obscure computer programming languages. R an obscure...
1214 sym
Big vectors coming to R
R has been available as a 64-bit application since it's earliest days. But the internal representation of R's fundamental data type — the vector — has long been subject to a 32-bit limitation: the maximum number of elements is capped at 2^31 (or just over 2.1 billion) elements. Now, at 8 bytes per element that's 16Gb of data, so that wasn't ...
2327 sym
Revolution Analytics at JSM 2012
Revolution Analytics is proud to once again be a gold sponsor and Wi-Fi sponsor of the JSM 2012 conference in San Diego, the largest gathering of statisticians, biostatisticians, analysts, data miners and data scientists in the world. The conference begins on Sunday, and you'll find the Revolution Analytics team in the exhibit hall. Drop by to ta...
3301 sym
A prediction for the Olympic men’s 100m sprint
R user Markus Gesmann used the gold-winning times from the Olympic Men's 100m sprint since 1990 as the basis of the following prediction for the London Games: My simple log-linear model forecasts a winning time of 9.68 seconds, which is 1/100 of a second faster than Usain Bolt's winning time in Beijing in 2008, but still 1/10 of a second slower...
1320 sym 2 img
The Environmental Performance Index, visualized with R
The Environmental Performance Index (EPI) ranks countries on performance indicators for environmental public health and ecosystem vitality. Yale University hosts the EPI website, which was used to present the 2012 EPI Rankings to world leaders at the 2012 World Economic Forum at Davos. The Country Profiles section of the website allowed members t...
2239 sym 2 img
Hadley Wickham’s ggplot2 basics
If you haven't made the plunge yet to making R graphics with Hadley Wickham's ggplot2 package, his “ggplot2 basics” slides (from the recent Introduction to Data Visualization and Analysis course at JSM) is a good place to start. Once you get the hang of the “grammar of graphics” notation, you'll be building beautiful data visualizations...
802 sym
R training: Visualization, Big Data, Data Mining, and Marketing Analytics
Revolution Analytics is hosting several live and online courses over the next couple of months that will be of interest to R users looking to hone their skills: Visualization in R with ggplot2. Garrett Grolemund and Winston Chang instruct how to use the ggplot2 package to make, format, label and adjust graphs using R. (August 28, Redwood City,...
1504 sym