Publications by David Smith
Pirating Pirate Data for Pirate Day
This past Tuesday was Talk Like A Pirate Date, the unofficial holiday of R (aRRR!) users worldwide. In recognition of the day, Bob Rudis used R to create this map of worldwide piracy incidents from 2013 to 2017. The post provides a useful and practical example of extracting data from a website without an API, otherwise known as “scraping” ...
2367 sym 2 img
Tutorial: Launch a Spark and R cluster with HDInsight
If you'd like to get started using R with Spark, you'll need to set up a Spark cluster and install R and all the other necessary software on the nodes. A really easy way to achieve that is to launch an HDInsight cluster on Azure, which is just a managed Spark cluster with some useful extra components. You'll just need to configure the components ...
1752 sym
News Roundup from Microsoft Ignite
It's been a big day for the team here at Microsoft, with a flurry of announcements from the Ignite conference in Orlando. We'll provide more in-depth details in the coming days and weeks, but for now here's a brief roundup of the news related to data science: Microsoft ML Server 9.2 is now available. This is the new name for what used to be calle...
2546 sym
Meet the new Microsoft R Server: Microsoft ML Server 9.2
Microsoft R Server has received a new name and a major update: Microsoft ML Server 9.2 is now available. ML Server provides a scalable production platform for R — and now Python — programs. The basic idea is that a local client can push R or Python code and have it operationalized on the remote server. ML Server is also included with the Data...
1522 sym
Featurizing images: the shallow end of deep learning
by Bob Horton and Vanja Paunic, Microsoft AI and Research Data Group Training deep learning models from scratch requires large data sets and significant computational reources. Using pre-trained deep neural network models to extract relevant features from images allows us to build classifiers using standard machine learning approaches that work w...
16828 sym R (5491 sym/23 pcs) 18 img
R 3.4.2 is released
The R Core team today announced the release of R 3.4.2. This release fixes a number of minor bugs and also includes a performance improvement to the commonly-used function c when applied to vectors with a names attribute. Like all minor releases, this release is backwards compatible with prior releases in the R 3.4.x series. Binary builds of R 3....
1254 sym
Convert hand-drawn equations to LaTeX with the mathpix package
Statistics involves a lot of mathematics, so one of the nice things about report-generation systems for R like Rmarkdown is that it makes it easy to include nicely-formatted equations by using the LaTeX syntax. So, if we want to include the density function of the Guassian Normal distribution: $$ \frac{1}{{\sigma \sqrt {2\pi } }} e^ { – \...
2122 sym R (74 sym/1 pcs)
Comparing assault death rates in the US to other advanced democracies
In an effort to provide context to the frequent mass shootings in the United States, Kieran Healy (Associate Professor of Sociology at Duke University) created this updated chart comparing assault death rates in the US to that of 23 other advanced democracies. The chart shows the rate (per 100,000 citizens) of death caused by assaults (stabbings,...
1820 sym 2 img
Create Powerpoint presentations from R with the OfficeR package
For many of us data scientists, whatever the tools we use to conduct research or perform an analysis, our superiors are going to want the results as a Microsoft Office document. Most likely it's a Word document or a PowerPoint presentation, it probably has to follow the corporate branding guidelines to boot. The OfficeR package, by David Gohel, ...
2443 sym 2 img
Introducing the Deep Learning Virtual Machine on Azure
A new member has just joined the family of Data Science Virtual Machines on Azure: The Deep Learning Virtual Machine. Like other DSVMs in the family, the Deep Learning VM is a pre-configured environment with all the tools you need for data science and AI development pre-installed. The Deep Learning VM is designed specifically for GPU-enabled inst...
2288 sym 2 img