Publications by David Smith
Catterplots: Plots with cats
As a devotee of Tufte, I'm generally against chartjunk. Graphical elements that obscure interpretation of the data occasionally have a useful role to play, but more often than not that role is to entertain the expense of enlightenment, or worse, to actively mislead. So it's with mixed feelings that I refer you to catterplots, an R package by Davi...
1354 sym R (259 sym/1 pcs) 2 img
Finding Radiohead’s most depressing song, with R
Radiohead is known for having some fairly maudlin songs, but of all of their tracks, which is the most depressing? Data scientist and R enthusiast Charlie Thompson ranked all of their tracks according to a “gloom index”, and created the following chart of gloominess for each of the band's nine studio albums. (Click for the interactive versio...
3270 sym 2 img
The difference between R and Excel
If you're an Excel user (or any other spreadsheet, really), adapting to learn R can be hard. As this blog post by Gordon Shotwell explains, one of the reasons is that simple things can be harder to do in R than Excel. But it's worth perservering, because complex things can be easier. While Excel (ahem) excels at things like arithmetic and tabul...
1168 sym 2 img
Preview: R Tools for Visual Studio 1.0
After more than a year in preview R Tools for Visual Studio, the open-source extension to the Visual Studio IDE for R programming, is nearing its official release. RTVS Release Candidate 1 is now available for download, giving you the opportunity to try out the new features ahead of the official announcement. We'll cover the features in detail ...
1648 sym 2 img
Prophet: How Facebook operationalizes time series forecasting at scale
Facebook is a famously data-driven organization, and an important goal in any data science activity is forecasting. Now, Facebook has released Prophet, an open-source package for R and Python that implements the time-series methodology that Facebook uses in production for forecasting at scale. Prophet has a very simple interface: you pass it a ...
2860 sym 2 img
ggraph: ggplot for graphs
A graph, a collection of nodes connected by edges, is just data. Whether it's a social network (where nodes are people, and edges are friend relationships), or a decision tree (where nodes are branch criteria or values, and edges decisions), the nature of the graph is easily represented in a data object. It might be represented as a matrix (where...
3252 sym 2 img
Forecasting gentrification in city neighborhoods, with R
If you've lived in a big city, you're likely familiar with the impact of gentrification. For longtime residents of a neighbourhood, it can represent a decline in the culture and vibrancy of your community; for recent or prospective residents, it can represent a financial opportunity in rising home prices. For those that live in a gentrifying neig...
2586 sym 4 img
Scholarships encourage diversity at useR!2017
While representation of women and minorities at last year's useR! conference was the highest it's ever been, there is always room for more diversity. To encourage more underrepresented individuals to attend, the useR! committee has taken several steps, including asking attendees to adhere to a supportive code of conduct and by providing childcare...
1186 sym
Predicting the length of a hospital stay, with R
I haven't been admitted to hospital many times in my life, but every time the only thing I really cared about was: when am I going to get out? It's also a question that weighs heavily on hospital managers: by knowing ahead of time how long each patient's stay is likely to be, they can better manage facilities and staff, and know whether the hospi...
2192 sym 2 img
Find modern, interactive web-based charts for R at the htmlwidgets gallery
While R's base graphics library is almost limitlessly flexible when it comes to create static graphics and data visualizations, new Web-based technologies like d3 and webgl open up new horizons in high-resolution, rescalable and interactive charts. Graphics built with these libraries can easily be embedded in a webpage, can be dynamically resized...
2152 sym 2 img