Publications by David Smith

Video: Data Mining with R

15.02.2013

Yesterday's Introduction to R for Data Mining webinar was a record setter, with more than 2000 registrants and more than 700 attending the live session presented by Joe Rickert. If you missed it, I've embedded the video replay below, and Joe's slides (with links to many useful resources) are also available. During the webinar, Joe demoed several...

1059 sym

10 R packages every data scientist should know about

18.02.2013

The yhat blog lists 10 R packages they wish they'd known about earlier. Drew Conway calls them “10 reasons to always start your analysis in R”. They're all very useful R packages that every data scientist should be aware of. They are: sqldf (for selecting from data frames using SQL) forecast (for easy forecasting of time series) plyr (data a...

1185 sym

Visualize major league pitching data with PitchRx

19.02.2013

Anyone interested in playing around with the data generated by the PITCHf/x cameras at major league baseball games should definitely check out the pitchRx package from Carson Sievert. Major League Baseball Advanced Media makes the data available for download, and this package provides an interface from R to the speed, position and pitcher data ...

1347 sym

Quandl: A Wikipedia for Time Series Data

20.02.2013

This guest post is by Tammer Kamel, Founder of Quandl Finding and formatting numerical data for analysis in R or Excel or indeed any application is a pain that all real world data analysts know all too well.  In aggregate I have probably spent weeks of my life trying to find data on the web.  And several more weeks validating, formatting and...

3495 sym

R in the news: Interviews with Revolution Analytics execs

22.02.2013

Here are three recent news articles that feature interviews with members of the Revolution Analytics team talking about the importance of the R language: In Forbes, CEO Dave Rich talks to Gil Press about the business landscape for Big Data. In the article, Dave says: SAS and SPSS remind me of Cobol and Fortran circa 1995. The scientific and aca...

2035 sym

Video: IBM Opinionated Infrastructure Hangout

22.02.2013

Had a great time earlier this week on a Google Hangout as part of the IBM Opinionated Infrastructure series. Moderator James Governor (analyst from RedMonk) kept the conversation lively, with topics ranging from to the value of information to the benefits of predictive analytics and evolution of Hadoop. R gets a mention at several points in the ...

947 sym

Free e-book on Data Science with R

22.02.2013

A new book by Jeffrey Stanton from Syracuse Iniversity School of Information Studies, An Introduction to Data Science, is now available for free download. The book, developed for Syracuse's Certificate for Data Science, is available under a Creative Commons License as a PDF (20Mb) or as an interactive eBook from iTunes. The book begins with the...

1773 sym 2 img

Revolution Newsletter: February 2013

25.02.2013

The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full February edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Case study: Real-Time Marketing Analytics. Online advertising technolo...

3705 sym

What is Revolution R Enterprise?

25.02.2013

Let us explain, in 90 seconds: Want a more in-depth introduction to R and Revolution R Enterprise? I'll be giving the webinar Revolution R Enterprise: 100% R and More on March 14. Just follow the link below to secure your seat for the live presentation, and to receive notification of the replay. Revolution Analytics webinars: Revolution R Enter...

764 sym

New ways to Hadoop with R

26.02.2013

Today, there are two main ways to use Hadoop with R and big data: 1. Use the open-source rmr package to write map-reduce tasks in R (running within the Hadoop cluster – great for data distillation!) 2. Import data from Hadoop to a server running Revolution R Enterprise, via Hbase, ODBC (for high-performance Hadoop/SQL interfaces), or streami...

1855 sym