Publications by David Smith

Microsoft Data Science VM now available as a Linux instance

13.04.2016

Microsoft's Linux Data Science Virtual Machine is now available for use on the Azure Marketplace. Like the Windows-based instance of the Data Science VM, this pre-built system based on Linux CentOS 7.2 includes all the tools you'll need to analyze data, including Microsoft R Open, Anaconda Python, Jupyter Notebooks and a PostgreSQL database inst...

1989 sym 2 img

A scalable data science platform with Microsoft R Server and Spark

18.04.2016

If you want to train a statistical model on very large amounts of data, you'll need three things: a storage platform capable of holding all of the training data, a computational platform capable of efficently performing the heavy-duty mathematical computations required, and a statistical computing language with algorithms that can take advantage...

1627 sym

Exploring NYC Taxi Data with Microsoft R Server and HDInsight

19.04.2016

As I mentioned yesterday, Microsoft R Server now available for HDInsight, which means that you can now run R code (including the big-data algorithms of Microsoft R Server) on a managed, cloud-based Hadoop instance.  Debraj GuhaThakurta, Senior Data Scientist, and Shauheen Zahirazami, Senior Machine Learning Engineer at Microsoft, demonstrate som...

2653 sym 2 img

Pride and Prejudice and Z-scores

20.04.2016

You might think literary criticism is no place for statistical analysis, but given digital versions of the text you can, for example, use sentiment analysis to infer the dramatic arc of an Oscar Wilde novel. Now you can apply similar techniques to the works of Jane Austen thanks to Julia Silge's R package janeaustenr (available on CRAN). The pa...

2147 sym 4 img

Microsoft R Open 3.2.4 now available

22.04.2016

Microsoft R Open 3.2.4, Microsoft's enhanced distribution of R, is now available for download from mran.microsoft.com. This update is based on R 3.2.4-revised, and includes several improvements and some minor bug fixes from the R Core Group. Improvements include long-vector support for the smooth function, a new stringsAsFactors options when us...

1834 sym 2 img

Webinar April 28: Effective Graphs with Microsoft R Open

25.04.2016

Naomi Robbins, author of Creating More Effective Graphs and Forbes contributor has teamed up with daughter Dr Joyce Robbins to present a new webinar this Thursday April 28, Creating Effective Graphs with Microsoft R Open. The webinar will demonstrate how to create a variety of useful graphics with R: comparisons, distributions, trends over time,...

1501 sym 2 img

Tufte-style graphics in R

29.04.2016

It's not an overstatement to say that, at least for me personally, Edward Tufte's book The Visual Display of Quantitative Information was transformative. Reading this book got me and, I feel confident saying, many many other data scientists passionate about visualizing data. This is the book that popularized Minard's chart depicting Napoleon's m...

2204 sym 4 img

Because it’s Friday: The time-travelling jukebox

03.05.2016

If you're looking for some musical nostalgia this weekend, look no further than How Music Taste Evolved, from design firm Polygraph (and with a hat tip to Pogue). Choose any month from the past six decades, and then sit back and watch the top 5 songs on the Billboard chart of the day move up and down and listen to the top song as time progresses....

1553 sym 2 img

R 3.3.0 now available

05.05.2016

R 3.3.0, a major annual update to the R Language, was released earlier this week and is now available from your local CRAN mirror for Windows, Mac (OSX 10.6 or later) and Linux systems. (Or as always, you can build it yourself from sources). This update — codenamed “Supposedly Educational” — makes a number of significant improvements ...

2472 sym 2 img

R Tools for Visual Studio 3.0 now available

06.05.2016

R Tools for Visual Studio, the open-source extenstion to Visual Studio that provides an IDE for the R language, has been upgraded to include several new features.  The latest update, RTVS 0.3, now includes: An R package manager, allowing you to review, install, and uninstall packages using a convenient user interface. The Variable Explorer now ...

2170 sym 4 img