Publications by David Smith
In case you missed it: March 2014 roundup
In case you missed them, here are some articles from March of particular interest to R users: Francis Smart offers five excellent reasons to use R, and notes that R is the top Google Search for statistical software. Revolution Analytics is offering R training for SAS users in Singapore and online. The number of R user groups worldwide con...
2874 sym
Animated Choropleths in R
Ari Lamstein has updated his choroplethr package with a new capability for creating animated data maps. I can't embed the animated version here, but click the image below to see an animation of US counties by average household income, from the richest to the poorest by percentile. (The code behind the animation is available on github.) The chlor...
1282 sym 2 img
R 3.1.0 "Spring Dance" is released
As announced this morning on mailing list, R 3.1.0 (codenamed “Spring Dance”) has been released. The source code is available now; as of this writing binary versions haven't yet appeared on CRAN or propagated to the mirrors, but I expect they'll be available in a day or two. You can check out the full list of changes in the Release Notes, but...
1750 sym
Create an impressionist self-portrait from your Twitter followers
Here's something fun you can do with R and its interface to Twitter, the TwitteR package. An R script by CMU student Mark Patterson downloads your Twitter profile picture, counts the number of Twitter followers you have, and then creates a pointillist version of your profile picture with as many dots as you have followers. Here's mine: Note that...
1070 sym 2 img
Interfacing R with Web technologies
A new Task View on CRAN will be of anyone who needs to connect R with Web-based applications. The Web Technologies and Services Task View lists R functions and pacakges for reading data from websites (via public APIs or by scraping data from HTML packegs); for interfacing with Cloud-based platforms (including AWS); for authenticating and accessin...
1297 sym
Why writing vectorized code in R is a good idea
As a language for statistical computing, R has always had a bias towards linear algebra, and is optimized for operations dealing in complete vectors and matrixes. This can be surprising to programmers coming to R from lower-level languages, where iterative programming (looping over the elements of a vector or matrix) is more natural and often mor...
1527 sym
DM Radio on Data Science
A couple of weeks ago, I participated in a panel discussion for DM Radio: “Still Sexy? How's that Data Scientist Gig Working Out?“. The title was provocative, but the discussion mostly revolved around the rise of data science and how advanced analytics (often implemented with R) is changing the way many companies do business today. Also on th...
1081 sym
R and the weather in the local news
The Mountain View Voice is a weekly newspaper serving the Silicon Valley area, and is a familiar sight to anyone wandering the streets of Palo Alto or Menlo Park. Angela Hey writes for 'Hey Tech!', an online blog of the Voice, and has just published a feature on R and the local Bay Area User Group (BARUG). It includes a nice history of R, and ...
1145 sym 2 img
Webinar: Big-Data Trees for R
If you missed last week's webinar presented by Revolution Analytics' US Chief Scientist Mario Inchiosa, Decision Trees built in Hadoop plus more Big Data Analytics with Revolution R Enterprise, the slides and webinar replay are now available for download. The webinar includes a demo of building decision trees and regression trees in Revolution R...
1175 sym
Simpson’s Paradox in a nutshell
Norm Matloff points us to a pithy example that sums up Simpson's Paradox perfectly, captured in the title of a medical paper: “Good for Women, Good for Men, Bad for People“. He explains how Simpson's Paradox isn't a paradox at all, but just the consequence of including a minor variable in a model ahead of a more significant variable, and illu...
861 sym