Publications by David Smith
Resampling data in Hadoop with RHadoop
On Revolution Analytics partner Cloudera's blog, Uri Laserson has posted an excellent guide to resampling from a large data set in Hadoop. Resampling is an important step in fitting ensemble models (including random forests and other bagging techniques), and Uri provides a step-by-step guide to implementing resampling methods using RHadoop. He p...
1117 sym
Slides from "Big Data Real Time Predictive Analytics"
At Tuesday's Data Driven Business Day at the Strata conference I gave my talk, Real-time Big Data Predictive Analytics: From Deployment to Production. My goal in the talk was to explain the buzz-phrases “real time”, “big data” and “predictive examples” in the context of a specific example: why are some web ads today uncannily targete...
1055 sym
Revolution Analytics News Roundup
Between the Strata conference and various announcements, last week was certainly a busy one for the crew here at Revolution Analytics. So I thought I'd take the opportunity to catch you up on some of the recent media articles you might have missed: The Wall Street Journal interviewed our new VP of Services Neera Talbert on the trend towards hir...
1605 sym
Track the bookies’ favourites for the next Pope
Tired of manually running a python script to scrape the latest bookmaker odds on the next Pope, R user AJ (an analytical research manager at a large healthcare company) instead created an R script to track the odds on the Papal successor, and automated it with the Shiny package for R. The screenshot below shows the odds of each candidate being el...
1106 sym 2 img
Quandl package released to CRAN
In a guest post here on February 20, Tammer Kamel introduced us to Quandl, a kind of “wikipedia” of time series data. In the post, Tammer (the founder of Quandl) noted that they were working on an R package to give R users access to Quandl as a data source. That package is now available. It includes the Quandl function, which R users can g...
1012 sym
Interview with Boulder BI Brain Trust
On Friday I traveled to Boulder, CO to update the Boulder BI Brain Trust on the latest news and updates from Revolution R Enterprise. While I was there, I was interviewed by BBBT president Claudia Imhoff. In a wide-ranging chat, we discussed: What's behind the Revolution Analytics momentum over the past year? How Business Intelligence relates ...
1153 sym
A map of worldwide email traffic, created with R
The Washing Post reports that by analyzing more than 10 million emails sent through the Yahoo! Mail service in 2012, a team of researchers used the R language to create a map of countries whose citizens email each other most frequently: The chart above shows the top 1000 country-country pairs by email frequency, arranged in a clustered network ...
2322 sym 2 img
Webinar tomorrow: 100% R and More
A quick note that I'll be hosting our regularly-scheduled webinar, Revolution R Enterprise, 100% R and More, at 10AM Pacific tomorrow. If you're new to R, or want to learn about the power, scalability and productivity features of Revolution R Enterprise, this is a great place to start. Revolution Analytics webinars: Revolution R Enterprise, 100%...
752 sym
In case you missed it: February 2013 Roundup
In case you missed them, here are some articles from February of particular interest to R users. How to resample from a large data set with RHadoop, and a video introduction to the RHadoop packages. A 90-second video explains: What is Revolution R Enterprise? Jeffrey Stanton has published a free e-book “An Introduction to Data Science” us...
2556 sym
Replay of Revolution R Enterprise: 100% R and More
If you missed last week's broadcast of the webinar Revolution R Enterprise: 100% R and More, I've embedded the replay below. If you're not familiar with the power, productivity and enterprise readiness that Revolution R Enterprise brings to open source R, this is a good place to start. Slides from the webinar and a downloadable video of the repl...
855 sym