Publications by David Smith

Pirating Pirate Data for Pirate Day

21.09.2017

This past Tuesday was Talk Like A Pirate Date, the unofficial holiday of R (aRRR!) users worldwide. In recognition of the day, Bob Rudis used R to create this map of worldwide piracy incidents from 2013 to 2017.  The post provides a useful and practical example of extracting data from a website without an API, otherwise known as “scraping” ...

2367 sym 2 img

Tutorial: Launch a Spark and R cluster with HDInsight

22.09.2017

If you'd like to get started using R with Spark, you'll need to set up a Spark cluster and install R and all the other necessary software on the nodes. A really easy way to achieve that is to launch an HDInsight cluster on Azure, which is just a managed Spark cluster with some useful extra components. You'll just need to configure the components ...

1752 sym

News Roundup from Microsoft Ignite

25.09.2017

It's been a big day for the team here at Microsoft, with a flurry of announcements from the Ignite conference in Orlando. We'll provide more in-depth details in the coming days and weeks, but for now here's a brief roundup of the news related to data science: Microsoft ML Server 9.2 is now available. This is the new name for what used to be calle...

2546 sym

Meet the new Microsoft R Server: Microsoft ML Server 9.2

26.09.2017

Microsoft R Server has received a new name and a major update: Microsoft ML Server 9.2 is now available. ML Server provides a scalable production platform for R — and now Python — programs. The basic idea is that a local client can push R or Python code and have it operationalized on the remote server. ML Server is also included with the Data...

1522 sym

Featurizing images: the shallow end of deep learning

27.09.2017

by Bob Horton and Vanja Paunic, Microsoft AI and Research Data Group Training deep learning models from scratch requires large data sets and significant computational reources. Using pre-trained deep neural network models to extract relevant features from images allows us to build classifiers using standard machine learning approaches that work w...

16828 sym R (5491 sym/23 pcs) 18 img

R 3.4.2 is released

28.09.2017

The R Core team today announced the release of R 3.4.2. This release fixes a number of minor bugs and also includes a performance improvement to the commonly-used function c when applied to vectors with a names attribute. Like all minor releases, this release is backwards compatible with prior releases in the R 3.4.x series. Binary builds of R 3....

1254 sym

Convert hand-drawn equations to LaTeX with the mathpix package

29.09.2017

Statistics involves a lot of mathematics, so one of the nice things about report-generation systems for R like Rmarkdown is that it makes it easy to include nicely-formatted equations by using the LaTeX syntax.  So, if we want to include the density function of the Guassian Normal distribution:  $$ \frac{1}{{\sigma \sqrt {2\pi } }}  e^ { – \...

2122 sym R (74 sym/1 pcs)

Comparing assault death rates in the US to other advanced democracies

02.10.2017

In an effort to provide context to the frequent mass shootings in the United States, Kieran Healy (Associate Professor of Sociology at Duke University) created this updated chart comparing assault death rates in the US to that of 23 other advanced democracies. The chart shows the rate (per 100,000 citizens) of death caused by assaults (stabbings,...

1820 sym 2 img

Create Powerpoint presentations from R with the OfficeR package

03.10.2017

For many of us data scientists, whatever the tools we use to conduct research or perform an analysis, our superiors are going to want the results as a Microsoft Office document. Most likely it's a Word document or a PowerPoint presentation, it probably has to follow the corporate branding guidelines to boot. The OfficeR package, by David Gohel, ...

2443 sym 2 img

Introducing the Deep Learning Virtual Machine on Azure

04.10.2017

A new member has just joined the family of Data Science Virtual Machines on Azure: The Deep Learning Virtual Machine. Like other DSVMs in the family, the Deep Learning VM is a pre-configured environment with all the tools you need for data science and AI development pre-installed. The Deep Learning VM is designed specifically for GPU-enabled inst...

2288 sym 2 img