Publications by David Smith

Data Journalism with R at FiveThirtyEight

27.07.2016

Since it expanded its focus from predicting the US election, FiveThirtyEight has emerged as a prominent source of in-depth data journalism, with data-driven analysis of media, culture, politics and society. A recent feature combined CDC and independent data sources to break down the nearly 34,000 gun deaths in the US in 2014 by cause of death and...

2019 sym 4 img

R moves up to 5th place in IEEE language rankings

29.07.2016

IEEE Spectrum has just published its third annual ranking with its 2016 Top Programming Languages, and the R Language is once again near the top of the list, moving up one place to fifth position. As I said last year (when R moved up to take sixth place), this is an extraordinary result for a domain-specific language. The other four languages in...

3104 sym 2 img

Azure ML Studio now supports Microsoft R Open, Python 3

01.08.2016

In Azure ML Studio, you use a browser-based “workbench” tool to flow data through pre-built data munging, machine learning and predictive modeling modules. These pre-built components perform computations in the Azure cloud and cover just about everything you'd want to do with data, including data transformation tools (add/remove columns, s...

2260 sym 2 img

Farewell, inside-r.org: where to find alternative R resources

03.08.2016

We recently had to decomission inside-r.org, the R resources site created by Revolution Analytics in 2010. Links to inside-r.org now redirect to MRAN, which now contains much of the material which was there, including an R package directory, examples of R applications, and a guide for getting started with R. But if you're looking for something t...

3532 sym 2 img

Interactive, Illustrator-quality graphics with R

05.08.2016

While many media properties including the New York Times, FiveThirtyEight and FlowingData use the R language to prepare graphics for publication, they often use Adobe Illustrator or similar graphics tools to touch up the last 5% or so of the graphics. Not so for Switzerland's news site swissinfo.ch, whose data journalist Duc-Quang Nguyen creat...

2117 sym 2 img

New cheat-sheet for the dplyrXdf package

08.08.2016

Hadley Wickham's dplyr package is an amazing tool for restructuring, filtering, and aggregating data sets using its elegant grammar of data manipulation. By default, it works on in-memory data frames, which means you're limited to the amount of data you can fit into R's memory. Hadley also provided an extension mechanism to make dplyr work with e...

1942 sym 2 img

In case you missed it: July 2016 roundup

10.08.2016

In case you missed them, here are some articles from July of particular interest to R users.  R moves up to 5th place in the annual IEEE Spectrum programming language rankings. A guide to R-related presentations at the JSM 2016 conference. FiveThirtyEight uses R extensively for data journalism, as explained in a presentation at useR!2016. An in...

2552 sym

Tuning Apache Spark for faster analysis with Microsoft R Server

12.08.2016

My colleagues Max Kaznady, Jason Zhang, Arijit Tarafdar and Miguel Fierro recently posted a really useful guide with lots of tips to speed up prototyping models with Microsoft R Server on Apache Spark. These tips apply when using Spark on Azure HDInsight, where you can spin up a Spark cluster the cloud with Microsoft R installed on the head nod...

2740 sym 4 img

The inexorable growth of student debt, charted with R

15.08.2016

Len Kiefer, Deputy Chief Economist at Freddie Mac, recently published the following chart to his personal blog showing household debt in the United States (excluding mortgage debt). As you can see, student loan debt has steadily increased over the last 13 years and has now eclipsed all other forms of non-mortgage debt: He also created this ani...

1600 sym 4 img

Extract tables from messy spreadsheets with jailbreakr

17.08.2016

R has some good tools for importing data from spreadsheets, among them the readxl package for Excel and the googlesheets package for Google Sheets. But these only work well when the data in the spreadsheet are arranged as a rectangular table, and not overly encumbered with formatting or generated with formulas. As Jenny Bryan pointed out in her ...

2255 sym 2 img