Publications by David Smith
Microsoft R Open 3.5.0 now available
Microsoft R Open 3.5.0 is now available for download for Windows, Mac and Linux. This update includes the open-source R 3.5.0 engine, which is a major update with many new capabilities and improvements to R. In particular, it includes a major new framework for handling data in R, with some major behind-the-scenes performance and memory-use benefi...
1604 sym
Hotfix for Microsoft R Open 3.5.0 on Linux
On Monday, we learned about a serious issue with the installer for Microsoft R Open on Linux-based systems. (Thanks to Norbert Preining for reporting the problem.) The issue was that the installation and de-installation scripts would modify the system shell, and did not use the standard practices to create and restore symlinks for system applicat...
1734 sym
Detecting unconscious bias in models, with R
There's growing awareness that the data we collect, and in particular the variables we include as factors in our predictive models, can lead to unwanted bias in outcomes: from loan applications, to law enforcement, and in many other areas. In some instances, such bias is even directly regulated by laws like the Fair Housing Act in the US. But eve...
2311 sym 2 img
Interpreting machine learning models with the lime package for R
Many types of machine learning classifiers, not least commonly-used techniques like ensemble models and neural networks, are notoriously difficult to interpret. If the model produces a surprising label for any given case, it's difficult to answer the question, “why that label, and not one of the others?”. One approach to this dilemma is the t...
2744 sym 2 img
PYPL Language Rankings: Python ranks #1, R at #7 in popularity
The new PYPL Popularity of Programming Languages (June 2018) index ranks Python at #1 and R at #7. Like the similar TIOBE language index, the PYPL index uses Google search activity to rank language popularity. PYPL, however, fcouses on people searching for tutorials in the respective languages as a proxy for popularity. By that measure, Python ...
1295 sym 6 img
A guide to working with character data in R
R is primarily a language for working with numbers, but we often need to work with text as well. Whether it’s formatting text for reports, or analyzing natural language data, R provides a number of facilities for working with character data. Handling Strings with R, a free (CC-BY-NC-SA) e-book by UC Berkeley’s Gaston Sanchez, provides an over...
1762 sym 2 img
The Financial Times and BBC use R for publication graphics
While graphics guru Edward Tufte recently claimed that “R coders and users just can't do words on graphics and typography” and need additonal tools to make graphics that aren't “clunky”, data journalists at major publications beg to differ. The BBC has been creating graphics “purely in R” for some time, with a typography style matchin...
1501 sym 6 img
Global Migration, animated with R
The animation below, by Shanghai University professor Guy Abel, shows migration within and between regions of the world from 1960 to 2015. The data and the methodology behind the chart is described in this paper. The curved bars around the outside represent the peak migrant flows for each region; globally, migration peaked during the 2005-2010 pe...
1261 sym 2 img
R 3.5.1 update now available
Last week the R Core Team released the latest update to the R statistical data analysis environment, R version 3.5.1. This update (codenamed “Feather Spray” — a Peanuts reference) makes no user-visible changes and fixes a few bugs. It is backwards-compatible with R 3.5.0, and users can find updates for Windows, Linux and Mac systems at thei...
1007 sym
In case you missed it: June 2018 roundup
In case you missed them, here are some articles from June of particular interest to R users. An animated visualization of global migration, created in R by Guy Abel. My take on the question, Should you learn R or Python for data science? The BBC and Financial Times use R — without post-processing — for publication graphics. “Handling String...
1810 sym