Publications by Guest Blogger
Introducing R for Big Data with PivotalR
Written by Hai Qian & Woo J. Jung of Pivotal Data Labs When discussing data science tools, it’s common for folks to passionately debate about algorithm breadth, scalability, and performance among the many available options. Yet one of the most important aspects to consider when choosing a data science tool—usability—is often ignored in t...
6637 sym R (40 sym/1 pcs) 4 img
A predictive maintenance solution template with SQL Server R Services
by Jaya Mathew, Data Scientist at Microsoft By using R Services within SQL Server 2016, users can leverage the power of R at scale without having to move their data around. Such a solution is beneficial for organizations with very sensitive, big data which cannot be hosted on any public cloud but does most of their coding in R. To illu...
8130 sym 2 img
How to choose the right tool for your data science project
by Brandon Rohrer, Principal Data Scientist, Microsoft R or Python? Torch or TensorFlow? (or MXNet or CNTK)? Spark or map-reduce? When we're getting started on a project, the mountain of tools to choose from can be overwhelming. Sometimes it makes me feel small and bewildered, like Alice in Wonderland. Luckily, the Cheshire Cat cut to the heart...
1715 sym 2 img
Building Scalable Data Pipelines with Microsoft R Server and Azure Data Factory
by Udayan Kumar, Data Scientist at Microsoft Beginning in 2016, Microsoft rolled out a preview of Microsoft R Server (MRS) for Azure HDInsight clusters. This service provides a preconfigured instance of R server with Spark/Hadoop that can be provisioned within minutes. Recent blog posts (by Max Kaznady and David Smith) have highlighted...
2584 sym 2 img
Estimating the value of a vehicle with R
by Srini Kumar, Director of Data Science at Microsoft We tend to think of R and other such ML tools only in the context of the workplace, to do “weighty” things aimed at saving millions. A little judicious use of R may help us hugely in our personal lives too. The ideas of regression, classification trees etc. can be powerful tools in valuati...
4119 sym 4 img
Sharing our R Programs — With Style
by Graham Williams, Director of Data Science, Microsoft Programming is an art and a way we express ourselves. As we write our programs we should keep in mind that someone else is very likely to be reading it. We can facilitate the accessibility of our programs through a clear presentation of the messages we are sharing. As data scientists we als...
3003 sym
Data Manipulation with sparklyr on Azure HDInsight
by Ali Zaidi, Data Scientist at Microsoft Apache Spark and a Tale of APIs Spark is an exceptionally popular processing engine for distributed data. Dealing with data in distributed storage and programming with concurrent systems often requires learning complicated new paradigms and techniques. Statisticans and data scientists familiar wtih R are ...
8537 sym R (7075 sym/16 pcs) 6 img
SAS to R Migration for Financial Data: Lessons and Examples
by Lixun Zhang (Data Scientist), Ye Xing (Senior Data Scientist) and Tao Wu (Principal Data Scientist Manager), all at Microsoft Editor's Note: To learn more about migrating from SAS to R, there will be a live webinar presented by Lixun and Ye tomorrow (Tuesday, November 15). Register to attend the webinar here. R has been gaining in populari...
3054 sym 6 img
Calculating AUC: the area under a ROC Curve
by Bob Horton, Microsoft Senior Data Scientist Receiver Operating Characteristic (ROC) curves are a popular way to visualize the tradeoffs between sensitivitiy and specificity in a binary classifier. In an earlier post, I described a simple “turtle’s eye view” of these plots: a classifier is used to sort cases in order from most to least l...
11931 sym R (3943 sym/10 pcs) 4 img 1 tbl
Using R to Gain Insights into the Emotional Journeys in War and Peace
by Wee Hyong Tok, Senior Data Scientist Manager at Microsoft How do you read a novel in record time, and gain insights into the emotional journey of main characters, as they go through various trials and tribulations, as an exciting story unfolds from chapter to chapter? I remembered my experiences when I start reading a novel, and I get intrigue...
4276 sym R (2072 sym/7 pcs) 8 img