Publications by David Smith
Predicting flu deaths with R
As Google learned, predicting the spread of influenza, even with mountains of data, is notoriously difficult. Nonetheless, bioinformatician and R user Shirin Glander has created a two-part tutorial about predicting flu deaths with R (part 2 here). The analysis is based on just 136 cases of influenza A H7N9 in China in 2013 (data provided in ...
2655 sym 2 img
Mixed Integer Programming in R with the ompr package
Numerical optimization is an important tool in the data scientist's toolbox. Many classical statistical problems boil down to finding the highest (or lowest) point on a multi-dimensional surface: the base R function optim provides many techniques for solving such maximum likelihood problems. Counterintuitively, numerical optimizations are easiest...
2782 sym R (184 sym/1 pcs) 2 img
Interactive decision trees with Microsoft R
Even though ensembles of trees (random forests and the like) generally have better predictive power and robustness, fitting a single decision tree to data can often be very useful for: understanding the important variables in a data set exploring unusual subsegments of the data (and the explanatory variables that define them) presenting a simpl...
1632 sym 2 img
Take a Test Drive of the Linux Data Science Virtual Machine
If you've been thinking about trying out the Data Science Virtual Machine on Linux, but don't yet have an Azure account, you can now take a free test drive — no credit card required! Just visit the Linux DSVM Marketplace page and click the blue button: The Linux Data Science Virtual Machine includes all of the tools a modern data scientist nee...
1256 sym 2 img
Merry ChRistmas!
Christmas day is soon upon us, so here's a greeting made with R: Each frame is a Voronoi Tesselation: about 1,000 points are chosen across the plane, which each generate a polygon comprising the region closer to it than any other selected point. These process is repeated for three designs (a heart, the word “Merry”, and the word “Xmas”)...
2160 sym 2 img
The Basics of Bayesian Statistics
Bayesian Inference is a way of combining information from data with things we think we already know. For example, if we wanted to get an estimate of the mean height of people, we could use our prior knowledge that people are generally between 5 and 6 feet tall to inform the results from the data we collect. If our prior is informative and we don'...
1526 sym
Combine choropleth data with raster maps using R
Switzerland is a country with lots of mountains, and several large lakes. While the political subdivisions (called municipalities) cover the high mountains and lakes, nothing much of economic interest happens in these places. (Raclette and sailing are wonderful, but don't count for our purposes.) For this reason, the Swiss Federal Statistical Of...
2736 sym 4 img
Using R to prevent food poisoning in Chicago
There are more than 15,000 restaurants in Chicago, but fewer than 40 inspectors tasked with making sure they comply with food-safety standards. To help prioritize the facilities targeted for inspection, the City of Chicago used R to create a model that predicts which restaurants are most likely to fail an inspection. Using this model to deploy in...
2211 sym
Power BI custom visuals, based on R
You've been able to include user-defined charts using R in Power BI dashboards for a while now, but a recent update to Power BI includes seven new custom charts based on R in the customs visuals gallery. You can see the new chart types by visiting the Power BI Custom Visuals Gallery and clicking on the “R-powered visuals” tab. The new custom...
1745 sym 2 img
The biggest R stories from 2016
It's been another great year for the R project and the R community. Let's look at some of the highlights from 2016. The R 3.3 major release brought some significant performance improvements to R, along with a spiffy new logo. There were also two updates in 2016: R 3.3.1 and R 3.3.2. (The R 3.2 series also received an update with R 3.2.4.) The R ...
3024 sym 2 img