Publications by David Smith
R 3.3.3 now available
The R core group announced today the release of R 3.3.3 (code-name: “Another Canoe”). As the wrap-up release of the R 3.3 series, this update mainly contains minor bug-fixes. (Bigger changes are planned for R 3.4.0, expected in mid-April.) Binaries for the Windows version are already up on the CRAN master site, and binaries for all platforms w...
1258 sym
In case you missed it: February 2017 roundup
In case you missed them, here are some articles from February of particular interest to R users. Public policy researchers use R to predict neighbourhoods in US cities subject to gentrification. The ggraph package provides a grammar-of-graphics framework for visualizing directed and undirected graphs. Facebook has open-sourced the “prophet”...
2454 sym
The Rise of Civilization, Visualized with R
This animation by geographer James Cheshire shows something at once simple and profound: the founding and growth of the cities of the world since the dawn of civilization. Dr Cheshire created the animation using R and the rworldmap package, using data from this Nature dataset. The complete R code is provided in the blog post linked below, and yo...
1093 sym 2 img
Updates to the Data Science Virtual Machine for Linux
The Data Science Virtual Machine (DSVM) is a virtual machine image on the Azure Marketplace assembled for data scientists. The goal of the DSVM is provide a broad array of popular data-oriented tools in a single environment, and make data scientists and developers highly productive in their work. It's available for both Windows and Linux, and th...
1601 sym 2 img
Benchmarking rxNeuralNet for OCR
The MicrosoftML package introduced with Microsoft R Server 9.0 added several new functions for high-performance machine learning, including rxNeuralNet. Tomaz Kastrun recently applied rxNeuralNet to the MNIST database of handwritten digits to compare its performance with two other machine learning packages, h2o and xgboost. The results are summ...
1328 sym 2 img
Neural Networks: How they work, and how to train them in R
With the current focus on deep learning, neural networks are all the rage again. (Neural networks have been described for more than 60 years, but it wasn't until the the power of modern computing systems became available that they have been successfully applied to tasks like image recognition.) Neural networks are the fundamental predictive engin...
1479 sym
Book Review: Testing R Code
When it comes to getting things right in data science, most of the focus goes to the data and the statistical methodology used. But when a misplaced parenthesis can throw off your results entirely, ensuring correctness in your programming is just as important. A new book published by CRC Press, Testing R Code by Richard (Richie) Cotton, provide...
2448 sym 2 img
Data Science at StitchFix
If you want to see a great example of how data science can inform every stage of a business process, from product concept to operations, look no further than Stitch Fix's Algorithms Tour. Scroll down through this explainer to see how this personal styling service uses data and statistical inference to suggest clothes their customers will love, sh...
2075 sym 2 img
Alteryx integrates with Microsoft R
You can now use Alteryx Designer, the data science workflow tool from Alteryx, as a drag-and-drop interface for many of the big-data statistical modeling tools included with Microsoft R. Alteryx v11.0 includes expanded support for Microsoft SQL Server 2016, Microsoft R Server, Azure SQL Data Warehouse, and Microsoft Analytics Platform System (AP...
1892 sym 2 img
Give a talk about an application of R at EARL
The EARL (Enterprise Applications of R) conference is one of my favourite events to go to. As the name of the conference suggests, the focus of the conference is where the rubber of the R language meets the road of it being used to solve real-world problems. Prior conferences have included presentations on how Maersk uses R to optimize capacity f...
1411 sym 2 img