Publications by David Smith

Text categorization with deep learning, in R

03.08.2017

Given a short review of a product, like “I couldn't put it down!”, can you predict what the product is? In that case it's pretty easy — it's for a book — but this general problem of text categorization comes up in a lot of natural language analysis problems. In his talk at useR!2017 (shown below), Microsoft data scientist Angus Taylor dem...

1798 sym

Painting with Data

04.08.2017

The accidental aRt tumblr (mentioned here a few years ago) continues to provide a steady stream of images that wouldn’t look out of place in a modern art gallery, but which in fact are data visualizations (mostly attempted in R), gone wrong. (Here’s a typical recent entry.) But now, Giora Simchoni has taken this concept to the next level by ...

2282 sym 2 img

How to make best use of the byte compiler in R

07.08.2017

Tomas Kalibera,  the newest member of the R Core Team, has been working for the last several years with fellow Core Team member Luke Tierney implementing R's byte-code compiler and interpreter. Byte-compiling R code often improves its speed of execution, and usually happens without you having to take any explicit action. R's base and recommended...

1777 sym

Tutorial: Publish an R function as a SQL Server stored procedure with the sqlrutils package

08.08.2017

In SQL Server 2016 and later, you can publish an R function to the database as a stored procedure. This makes it possible to run your R function on the SQL Server itself, which makes the power of that server available for R computations, and also eliminates the time required to move data to and from the server. It also makes your R function avail...

1608 sym 2 img

In case you missed it: July 2017 roundup

11.08.2017

In case you missed them, here are some articles from July of particular interest to R users. A tutorial on using the rsparkling package to apply H20's algorithms to data in HDInsight. Several exercises to learn parallel programming with the foreach package. A presentation on the R6 class system, by Winston Chang. Introducing “joyplots“, a ggp...

2452 sym

Reproducibility: A cautionary tale from data journalism

14.08.2017

Timo Grossenbacher, data journalist with Swiss Radio and TV in Zurich, had a bit of a surprise when he attempted to recreate the results of one of the R Markdown scripts published by SRF Data to accompany their data journalism story about vested interests of Swiss members of parliament. Upon re-running the analysis in R last week, Timo was surpr...

3095 sym 2 img

Buzzfeed trains an AI to find spy planes

15.08.2017

Last year, Buzzfeed broke the story that US law enforcement agencies were using small aircraft to observe points of interest in US cities, thanks to analysis of public flight-records data. With the data journalism team no doubt realizing that the Flightradar24 data set hosted many more stories of public interest, the challenge lay in separating r...

2157 sym 2 img

How to build an image recognizer in R using just a few images

16.08.2017

Microsoft Cognitive Services provides several APIs for image recognition, but if you want to build your own recognizer (or create one that works offline), you can use the new Image Featurizer capabilities of Microsoft R Server.  The process of training an image recognition system requires LOTS of images — millions and millions of them. The pro...

3805 sym R (315 sym/1 pcs) 6 img

20 years of the R Core Group

17.08.2017

The first “official” version of R, version 1.0.0, was released on February 29, 200. But the R Project had already been underway for several years before then. Sharing this tweet, from yesterday, from R Core member Peter Dalgaard: It was twenty years ago today, Ross Ihaka got the band to play…. #rstats pic.twitter.com/msSpPz2kyA — Peter D...

1460 sym

Obstacles to performance in parallel programming

18.08.2017

Making your code run faster is often the primary goal when using parallel programming techniques in R, but sometimes the effort of converting your code to use a parallel framework leads only to disappointment, at least initially. Norman Matloff, author of Parallel Computing for Data Science: With Examples in R, C++ and CUDA, has shared chapter ...

1623 sym