Publications by Guest Blogger
Parallelizing Data Analytics on Azure with the R Interface Tool
by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft) In data science, to develop a model with optimal performance, exploratory experiments on different sets of hyper-parameters are often performed. Preliminary analyses on smaller data can be performed on a single machine, while the experimental one on...
7044 sym R (2573 sym/5 pcs) 4 img 1 tbl
The Flexibility of Remote and Local R Workspaces
by Sean Wells, Senior Software Engineer, Microsoft The mrsdeploy R package facilitates Remote Execution and Web Service interactions from your local R IDE command line against a remote Microsoft R Server instance. Both core features can be used independently of one another or combined to support different convenient workflows. These different wor...
3672 sym R (544 sym/5 pcs) 6 img 2 tbl
Employee Retention with R Based Data Science Accelerator
by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft) Employee retention has been and will continue to be one of the biggest challenges of a company. While classical tactics such as promotion, competitive perks, etc. are practiced as ways to retain employees, it is now a hot trend to rely on machine lea...
5533 sym R (355 sym/1 pcs) 6 img
AUC Meets the Wilcoxon-Mann-Whitney U-Statistic
by Bob Horton, Senior Data Scientist, Microsoft The area under an ROC curve (AUC) is commonly used in machine learning to summarize the performance of a predictive model with a single value. But you might be surprised to learn that the AUC is directly connected to the Mann-Whitney U-Statistic, which is commonly used in a robust, non-parametric al...
7955 sym R (4150 sym/9 pcs) 8 img
Running your R code on Azure with mrsdeploy
by John-Mark Agosta, data scientist manager at Microsoft Let’s say you’ve built a model in R that is larger than you can conveniently run locally, and you want to take advantage of Azure’s resources simply to run it on a larger machine. This blog explains how to provision and run an Azure virtual machine (VM) for this, using the mrsdeploy l...
13796 sym R (5283 sym/23 pcs) 8 img
Massively-parallel computations on Azure clusters with R, made easy with doAzureParallel
by JS Tan (Program Manager, Microsoft) For users of the R language, scaling up their work to take advantage of cloud-based computing has generally been a complex undertaking. We are therefore excited to announce doAzureParallel, a lightweight R package built on Azure Batch that allows you to easily use Azure’s flexible compute resources right f...
3416 sym 4 img
R is for Archaeology: A report on the 2017 Society of American Archaeology meeting
by Ben Marwick, Associate Professor of Archaeology, University of Washington and Senior Research Scientist, University of Wollongong The Society of American Archaeology (SAA) is one of the largest professional organisations for archaeologists in the world, and just concluded its annual meeting in Vancouver, BC at the end of March. The R language ...
7800 sym 2 img
AzureDSVM: a new R package for elastic use of the Azure Data Science Virtual Machine
by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft) The Azure Data Science Virtual Machine (DSVM) is a curated VM which provides commonly-used tools and software for data science and machine learning, pre-installed. AzureDSVM is a new R package that enables seamless interaction with the DSVM from a l...
2968 sym R (1553 sym/5 pcs)
Who is the caretaker? Evidence-based probability estimation with the bnlearn package
by Juan M. Lavista Ferres , Senior Director of Data Science at Microsoft In what was one of the most viral episodes of 2017, political science Professor Robert E Kelly was live on BBC World News talking about the South Korean president being forced out of office when both his kids decided to take an easy path to fame by showing up in their dad�...
3492 sym R (2629 sym/3 pcs) 2 img
XGBoost support added to Rattle
by Fang Zhou, Data Scientist; and Graham Williams, Director of Data Science, all at Microsoft Rattle — the R Analytical Tool To Learn Easily — is a popular open-source GUI for data mining using R. It presents statistical and visual summaries of data, transforms data that can be readily modelled, builds both unsupervised and supervised models ...
3257 sym 8 img