Publications by David Smith

Ryan Rosario on Parallel programming in R

17.08.2012

Earlier this year data scientist Ryan Rosario gave a talk on parellel computing with R to the Los Angeles R User Group, and he recently made the slides from the talk available online. They're a great resource for anyone looking to make use of multi-processor systems a Hadoop based architechure to speed computations with big data. Ryan's talk wa...

2439 sym

Getting Started with R and Hadoop

20.08.2012

Last week's meeting of the Chicago area Hadoop User Group (a joint meeting the Chicago R User Group, and sponsored by Revolution Analytics) focused on crunching Big Data with R and Hadoop. Jeffrey Breen, president of Atmosphere Research Group, frequently deals with large data sets in his airline consulting work, and R is his “go-to tool for ...

2222 sym 2 img

Creating beautiful reports from R with knitr

21.08.2012

People use the R language every day to create the elements of reports: tables, charts, analyses, and forecasts. But assembling all of that information into a print-ready document laid out with text can a hassle. You can cut-and-paste all of the elements into Word, but then what do you do when the data file gets updated at the last minute? (Answer...

1741 sym 2 img

Benchmarking random-number generation from C++

22.08.2012

If you're writing C++ code and want to generate random numbers, you might not be aware that R provides an API to call the R RNG functionality directly. The Rcpp package's “syntactic sugar” feature makes this process easier, by automating the process of translating a subset of ordinary R code into compiled C++ code. That means you can write co...

1750 sym

Revolution Analytics receives Top Innovator award for Data Science Technology

23.08.2012

A big thank-you to all the R users out there who voted for Revolution R Enterprise in DataWeek Awards. We're so pleased to be recognized by the voters and the DataWeek judging panel with the Top Innovator Award for Data Science Technology. We're looking forward to the awards ceremony next week at DataWeek SF (in San Francisco, September 24-27). I...

1786 sym 2 img

Does playing baseball shorten your lifespan? (Answer: No.)

24.08.2012

A National Institute for Occupational Safety and Health study, published in March, found that professional American football (NFL) players lived longer, on average, than similar “mere mortals” in the general population. Football is a dangerous sport, so that might seem surprising at first, until you consider the fact that NFL players are el...

4006 sym 2 img 1 tbl

Two R community milestones

27.08.2012

Two significant R community milestones were achieved over the weekend. Firstly, the number of community-contributed R packages on CRAN is now above 4000. (As of this writing, it's 4004.) Figure 10 of The Popularity of Data Analysis Software charts the exponential growth of R packages: at the end of last year the figure stood at 3500, and the num...

1493 sym

Arctic sea-ice at lowest levels since observations began

28.08.2012

RealClimate.org used the R language and data from the National Snow and Ice Data Center to create this chart showing the extent of Arctic sea-ice in each year since satellite observations began in 1978, and the current extent of ice coverage (in red). Even though there are several weeks of annual melting yet to come, the area of ice covering the...

1526 sym 4 img

R does CSI: Using R to nail break-in suspects

29.08.2012

You've probably heard (or seen in TV shows) how the unique pattern of rifling in a gunbarrel generates forensic evidence: microscopic scoring on the bullets left at the scene of the crime can be linked to the shooter by matching the marks to the firearm. What you might not know is that the same technique can be applied to other crimes: for exampl...

1908 sym 4 img

Three ways of visualizing the growth of Walmart

30.08.2012

It's a wonderful thing when people make interesting data sets available to the public. When Thomas Jones wrote a paper in Econometrics about the growth of US retail giant Walmart, he made the data he collected about every Walmart store opening in history (location and date) available to the public. Since then, several people have used differen...

1451 sym 2 img