Publications by David Smith

Highlights of the Data Science Track at Microsoft Ignite

21.08.2017

I will be at the AI Summit in San Francisco next month, which means I can't make it to Ignite in Orlando this year. Which is a bit of a shame, because there's a fantastic Data Science track at Ignite. There are 25 sessions on offer, with presentations from my Microsoft colleagues on Microsoft R, Cognitive Toolkit, Bot Framework, the Team Data Sc...

1576 sym

Gender roles in film direction, analyzed with R

22.08.2017

What do women do in films? If you analyze the stage directions in film scripts — as Julia Silge, Russell Goldenberg and Amber Thomas have done for this visual essay for ThePudding — it seems that women (but not men) are written to snuggle, giggle and squeal, while men (but not women) shoot, gallop and strap things to other things.   This i...

1739 sym 2 img

Recreating and updating Minard with ggplot2

23.08.2017

Minard's chart depicting Napoleon's 1812 march on Russia is a classic of data visualization that has inspired many homages using different time-and-place data. If you'd like to recreate the original chart, or create one of your own, Andrew Heiss has created a tutorial on using the ggplot2 package to re-envision the chart in R: The R script prov...

1336 sym 4 img

Tips and tricks on using R to query data in Power BI

25.08.2017

In Power BI, the dashboarding and reporting tool, you can use R to filter, transform, or restructure data via the Query Editor. For example, you could use the mice package to impute missing values, or use the tidytext package to assign sentiment scores to text inputs. As Imke Feldmann explains, there are lots of useful tricks you can accomplish u...

1165 sym

Packages to simplify mapping in R

28.08.2017

Computerworld's Sharon Machlis has published a very useful tutorial on creating geographic data maps with R. (The tutorial was actually published back in March, but I only came across it recently.) While it's been possible to create maps in R for a long time, some recent packages and data APIs have made the process much simpler. The tutorial is b...

1924 sym 2 img

3-D animations with R

30.08.2017

R is often used to visualize and animate 2-dimensional data. (Here are just a few examples.)  But did you know you can create 3-dimensional animations as well?  As Thomas Lins Pedersen explains in a recent blog post, the trick is in using the persp function to translate points in 3-D space into a 2-D projection. This function is normally used t...

2072 sym 4 img

Probably more likely than probable

30.08.2017

What kind of probability are people talking about when they say something is “highly likely” or has “almost no chance”? The chart below, created by Reddit user zonination, visualizes the responses of 46 other Reddit users to “What probability would you assign to the phase: <phrase>” for various statements of probability. Each set of r...

1411 sym 4 img

Text featurization with the Microsoft ML package

31.08.2017

Last week I wrote about how you can use the MicrosoftML package in Microsoft R to featurize images: reduce an image to a vector of 4096 numbers that quantify the essential characteristics of the image, according to an AI vision model. You can perform a similar featurization process with text as well, but in this case you have a lot more control o...

3543 sym R (1202 sym/4 pcs) 2 img

Practical Data Science for Stats

01.09.2017

PeerJ Preprints has recently published a collection of articles that focus on the practical side of statistical analysis: Practical Data Science for Stats. While the articles are not peer-reviewed, they have been selected and edited by Jennifer Bryan and Hadley Wickham, both well-respected members of the R community. And while the articles provi...

1981 sym 2 img

Preview of EARL London 2017

05.09.2017

The next event in the Effective Applications of the R Language (EARL) conference series takes place next week, with EARL London 2017. The EARL conference series got its start in London, and the London event remains the biggest and brightest of the venues. This year's program is no exception, with an impressive raft of tutorials and keynote speake...

3108 sym