Publications by David Smith
dv01 uses R bring greater transparency to the consumer lending market
The founder of the NYC-based startup dv01 watched the 2008 financial crisis and was inspired to bring greater transparency to institutional investors in the consumer lending market. Despite being an open-source shop, they switched their data services to Microsoft SQL Server to provide better performance (reducing latency for queries from tens of...
1402 sym 2 img
Where Europe lives, in 14 lines of R Code
Via Max Galka, always a great source of interesting data visualizations, we have this lovely visualization of population density in Europe in 2011, created by Henrik Lindberg: Impressively, the chart was created with just 14 lines of R code: (To recreate it yourself, download the GEOSTAT-grid-POP-1K-2011-V2-0-1.zip file from eurostat, and move ...
1127 sym 2 img
Make pleasingly parallel R code with rxExecBy
Some things are easy to convert from a long-running sequential process to a system where each part runs at the same time, thus reducing the required time overall. We often call these “embarrassingly parallel” problems, but given how easy it is to reduce the time it takes to execute them by converting them into a parallel process, “pleasingl...
2505 sym 2 img
Using Microsoft R with Alteryx
Alteryx Designer, the self-service analytics workflow tool, recently added integration with Microsoft R. This allows you to train models provided by Microsoft R, and create predictions from them, without needing to write R code — you simply drag-and-drop to create a workflow. In a recent post at the Microsoft R blog, Bharath Sankaranarayan wal...
1184 sym 2 img
The Datasaurus Dozen
There's a reason why data scientists spend so much time exploring data using graphics. Relying only on data summaries like means, variances, and correlations can be dangerous, because wildly different data sets can give similar results. This is a principle that has been demonstrated in statistics classes for decades with Anscombe's Quartet: fou...
2032 sym 4 img
Technical Foundations of Informatics: A modern introduction to R
Informatics (or Information Science) is the practice of creating, storing, finding, manipulating and sharing information. These are all tasks that the R language was designed for, and so Technical Foundations of Informatics, the online course guide for the University of Washington course of the same name, also provides an excellent resource for...
3038 sym 2 img
Real-time scoring with Microsoft R Server 9.1
Once you've built a predictive model, in many cases the next step is to operationalize the model: that is, generate predictions from the pre-trained model in real time. In this scenario, latency becomes the critical metric: new data typically become available a single row at a time, and it's important to respond with that single prediction (or s...
2433 sym 2 img
Predicting Hospital Length of Stay using SQL Server R Services
Last week, my Microsoft colleagues Bharath Sankaranarayan and Carl Saroufim presented a live webinar showing how you can predict a patient's length of stay at a hospital using SQL Server R Services. The recorded webinar is available for on-demand viewing now. (Registration is required to view.) The webinar is based on the Machine Learning Solut...
3143 sym 2 img
Stack Overflow Trends
Developer Q&A site Stack Overflow recently introduced Stack Overflow Trends, a useful tool for tracking the growth and decline in the rate of questions asked on various topics (by their Stack Overflow tag). For example, you can see that activity around both R and Python has been increasing over the past 8 years: As you'd expect from a general pu...
1247 sym 4 img
Analyzing data on CRAN packages
There's a handy new function in R 3.4.0 for anyone interested in data about CRAN packages. It's not documented, but it's pretty simple: tools::CRAN_package_db() returns a data frame with one row for every package on CRAN and 65 columns of data on those packages, as shown below. > names(tools::CRAN_package_db()) [1] "Package" "Ver...
1958 sym R (1852 sym/2 pcs) 4 img