Publications by David Smith

Statistical Machine Learning with Microsoft ML

23.10.2017

MicrosoftML is an R package for machine learning that works in tandem with the RevoScaleR package. (In order to use the MicrosoftML and RevoScaleR libraries, you need an installation of Microsoft Machine Learning Server or Microsoft R Client.) A great way to see what MicrosoftML can do is to take a look at the on-line book Machine Learning with ...

1439 sym

Create editable Microsoft Office charts from R

24.10.2017

R has a rich and infinitely flexible graphics system, and you can easily embed R graphics into Microsoft Office documents like PowerPoint or Word. The one thing I dread hearing after delivering such a document, though, is “how can I tweak that graphic?”. I could change the colors or fonts or dimensions in R, of course, but sometimes people ju...

3246 sym R (295 sym/1 pcs) 6 img

Two upcoming webinars

25.10.2017

Two new Microsoft webinars are taking place over the next week that may be of interest: AI Development in Azure using Data Science Virtual Machines The Azure Data Science Virtual Machine (DSVM) provides a comprehensive development and production environment to Data Scientists and AI-savvy developers. DSVMs are specialized virtual machine images...

2382 sym

Microsoft R Open 3.4.2 now available

27.10.2017

Microsoft R Open (MRO), Microsoft's enhanced distribution of open source R, has been upgraded to version 3.4.2 and is now available for download for Windows, Mac, and Linux. This update upgrades the R language engine to the latest R 3.4.2 and updates the bundled packages.  MRO is 100% compatible with all R packages. MRO 3.4.2 points to a fixe...

1595 sym

Recent updates to the Team Data Science Process

30.10.2017

It's been over a year since we first introduced introduced the Team Data Science Process (TDSP). The data, technology and practices behind Data Science continue to evolve, and the TDSP has evolved in parallel. Over the past year, several new facets have been added, including: The IDEAR (Interactive Data Exploration, Analysis and Reporting) fr...

1573 sym 2 img

Survey of Kagglers finds Python, R to be preferred tools

31.10.2017

Competitive predictive modeling site Kaggle conducted a survey of participants in prediction competitions, and the 16,000 responses provide some insights about that user community. (Whether those trends generalize to the wider community of all data scientists is unclear, however.) One question of interest asked what tools Kagglers use at work. P...

1640 sym 2 img

R: the least disliked programming language

01.11.2017

According to a recent analysis of Stack Overflow “Developer Stories”, where programmer candidates list the technologies the would and would not like to work with, R is the least disliked programming language: This is probably related to the fact that there's high demand in the job market for fast-growing technologies, which is a disincentive...

1469 sym 2 img

New RStudio cheat sheet: Strings in R

03.11.2017

The RStudio team has created another very useful cheat sheet for R: Working with Strings. This cheat sheet provides an example-laden menu of operations you can perform on strings (character verctors) in R using the stringr package. While base R provides a solid set of string manipulation functions, the stringr package functions are simpler, more ...

1198 sym 2 img

A history-oriented introduction to R for Excel users

06.11.2017

While spreadsheets are fine tools for collecting and sharing data, the temptation is often there to also use them for in-depth analysis better suited to reproducible systems like R. Historian Jesse Sadler recently published the useful guide Excel vs R: A Brief Introduction to R, which provides useful advice to data analysts currently using sprea...

2292 sym 2 img

In case you missed it: October 2017 roundup

07.11.2017

In case you missed them, here are some articles from October of particular interest to R users. A recent survey of competitors on the Kaggle platform reveals that Python (76%) and R (59%) are the preferred tools for building predictive models. Microsoft's “Team Data Science Process” has been updated with new guidelines on use of the IDEAR fra...

2798 sym