Publications by Guest Blogger

Parallelizing Data Analytics on Azure with the R Interface Tool

27.12.2016

by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft) In data science, to develop a model with optimal performance, exploratory experiments on different sets of hyper-parameters are often performed. Preliminary analyses on smaller data can be performed on a single machine, while the experimental one on...

7044 sym R (2573 sym/5 pcs) 4 img 1 tbl

The Flexibility of Remote and Local R Workspaces

04.01.2017

by Sean Wells, Senior Software Engineer, Microsoft The mrsdeploy R package facilitates Remote Execution and Web Service interactions from your local R IDE command line against a remote Microsoft R Server instance. Both core features can be used independently of one another or combined to support different convenient workflows. These different wor...

3672 sym R (544 sym/5 pcs) 6 img 2 tbl

Employee Retention with R Based Data Science Accelerator

09.03.2017

by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft) Employee retention has been and will continue to be one of the biggest challenges of a company. While classical tactics such as promotion, competitive perks, etc. are practiced as ways to retain employees, it is now a hot trend to rely on machine lea...

5533 sym R (355 sym/1 pcs) 6 img

AUC Meets the Wilcoxon-Mann-Whitney U-Statistic

15.03.2017

by Bob Horton, Senior Data Scientist, Microsoft The area under an ROC curve (AUC) is commonly used in machine learning to summarize the performance of a predictive model with a single value. But you might be surprised to learn that the AUC is directly connected to the Mann-Whitney U-Statistic, which is commonly used in a robust, non-parametric al...

7955 sym R (4150 sym/9 pcs) 8 img

Running your R code on Azure with mrsdeploy

22.03.2017

by John-Mark Agosta, data scientist manager at Microsoft Let’s say you’ve built a model in R that is larger than you can conveniently run locally, and you want to take advantage of Azure’s resources simply to run it on a larger machine. This blog explains how to provision and run an Azure virtual machine (VM) for this, using the mrsdeploy l...

13796 sym R (5283 sym/23 pcs) 8 img

Massively-parallel computations on Azure clusters with R, made easy with doAzureParallel

29.03.2017

by JS Tan (Program Manager, Microsoft) For users of the R language, scaling up their work to take advantage of cloud-based computing has generally been a complex undertaking. We are therefore excited to announce doAzureParallel, a lightweight R package built on Azure Batch that allows you to easily use Azure’s flexible compute resources right f...

3416 sym 4 img

R is for Archaeology: A report on the 2017 Society of American Archaeology meeting

14.04.2017

by Ben Marwick, Associate Professor of Archaeology, University of Washington and Senior Research Scientist, University of Wollongong The Society of American Archaeology (SAA) is one of the largest professional organisations for archaeologists in the world, and just concluded its annual meeting in Vancouver, BC at the end of March. The R language ...

7800 sym 2 img

AzureDSVM: a new R package for elastic use of the Azure Data Science Virtual Machine

19.05.2017

by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft) The Azure Data Science Virtual Machine (DSVM) is a curated VM which provides commonly-used tools and software for data science and machine learning, pre-installed. AzureDSVM is a new R package that enables seamless interaction with the DSVM from a l...

2968 sym R (1553 sym/5 pcs)

Who is the caretaker? Evidence-based probability estimation with the bnlearn package

26.05.2017

by Juan M. Lavista Ferres , Senior Director of Data Science at Microsoft In what was one of the most viral episodes of 2017, political science Professor Robert E Kelly was live on BBC World News talking about the South Korean president being forced out of office when both his kids decided to take an easy path to fame by showing up in their dad�...

3492 sym R (2629 sym/3 pcs) 2 img

XGBoost support added to Rattle

07.07.2017

by Fang Zhou, Data Scientist; and Graham Williams, Director of Data Science, all at Microsoft Rattle — the R Analytical Tool To Learn Easily — is a popular open-source GUI for data mining using R. It presents statistical and visual summaries of data, transforms data that can be readily modelled, builds both unsupervised and supervised models ...

3257 sym 8 img