Publications by David Smith
Studying disease with R: RECON, The R Epidemics Consortium
For almost a year now, a collection of researchers from around the world has been collaborating to develop the next generation of analysis tools for disease outbreak response using R. The R Epidemics Consortium (RECON) creates R packages for handling, visualizing, and analyzing outbreak data using cutting-edge statistical methods, along with ge...
1783 sym 2 img
Demo: Real-Time Predictions with Microsoft R Server
At the R/Finance conference last month, I demonstrated how to operationalize models developed in Microsoft R Server as web services using the mrsdeploy package. Then, I used that deployed model to generate predictions for loan delinquency, using a Python script as the client. (You can see slides here, and a video of the presentation below.) With...
1704 sym
Applications of R at EARL San Francisco 2017
The Mango team held their first instance of the EARL conference series in San Francisco last month, and it was a fantastic showcase of real-world applications of R. This was a smaller version of the EARL conferences in London and Boston, but with that came the opportunity to interact with R users from industry in a more intimate setting. Hopefull...
3134 sym
Using sparklyr with Microsoft R Server
The sparklyr package (by RStudio) provides a high-level interface between R and Apache Spark. Among many other things, it allows you to filter and aggregate data in Spark using the dplyr syntax. In Microsoft R Server 9.1, you can now connect to a a Spark session using the sparklyr package as the interface, allowing you to combine the data-prepara...
1598 sym
R leads, Python gains in 2017 Burtch Works Survey
For the past four years, recruiting firm Burtch Works has conducted a simple survey of data scientists with just one question: “Which do you prefer to use — SAS, R or Python“. The results for this year's survey of 1,046 respondents are in: R: 40% (2016: 42%) SAS: 34% (2016: 39%) Python: 26% (2016: 20%) Compared to last year's results, Pyt...
1443 sym 2 img
Updated Data Science Virtual Machine for Windows: GPU-enabled with Docker support
The Windows edition of the Data Science Virtual Machine (DSVM), the all-in-one virtual machine image with a wide-collection of open-source and Microsoft data science tools, has been updated to the Windows Server 2016 platform. This update brings built-in support for Docker containers and GPU-based deep learning. GPU-based Deep Learning. While...
2126 sym 2 img
Interactive R visuals in Power BI
Power BI has long had the capability to include custom R charts in dashboards and reports. But in sharp contrast to standard Power BI visuals, these R charts were static. While R charts would update when the report data was refreshed or filtered, it wasn't possible to interact with an R chart on the screen (to display tool-tips, for example). But...
2110 sym 2 img
The R community is one of R’s best features
R is incredible software for statistics and data science. But while the bits and bytes of software are an essential component of its usefulness, software needs a community to be successful. And that's an area where R really shines, as Shannon Ellis explains in this lovely ROpenSci blog post. For software, a thriving community offers developers, e...
2620 sym 2 img
Useful tricks when including images in Rmarkdown documents
Rmarkdown is an enormously useful system for combining text, output and graphics generated by R into a single document. Images, in particular, are a powerful means of communication in a report, whether they be data visualizations, diagrams, or pictures. To maximize the power of those images, Zev Ross has created a comprehensive list of tips and t...
2293 sym
How R is used by the FDA for regulatory compliance
I was recently alerted (thanks Maëlle and Mikhail!) to an enlightening presentation from last years' useR! conference. (This year's useR! conference takes place next week in Belgium.) Paul H Schuette, Scientific Computing Coordinator at the FDA Center for Drug Evaluation and Research (CDER), talked about how R is used in the process of regulatin...
2779 sym