Publications by David Smith

AirBnB grows by sharing data scientist knowledge

10.11.2016

This animation of AirBnB host locations from 2011-2014, presented by Ricardo Bion (data scientist manager at AirBnb) at the EARL Boston conference earlier this week, shows the dramatic growth in properties to rent through the service along with the most common routes of travellers. (You can find the R code that created this animation here.) How ...

2059 sym 4 img

A computer vision challenge: finding boats in the Mona Lisa

11.11.2016

About a decade or so photomosaics were all the rage: a near-recreation of a famous image by using many smaller images as elements. Here, for example, is the Mona Lisa, created using the Metapixel program by overlaying 32×32 images of vehicles and animals. An image like this presents an interesting computer vision challenge: can you use deep ...

2067 sym 4 img

In case you missed it: October 2016 roundup

15.11.2016

In case you missed them, here are some articles from October of particular interest to R users.  A brief summary of the R 3.3.2 release. “Data Science with SQL Server 2016“, a free E-book featuring several in-depth R examples, is now available for download. The ReporterRs package makes it easy to insert R output, tables and graphics into Wor...

2725 sym

How to call Cognitive Services APIs with R

16.11.2016

Microsoft Cognitive Services is a set of cloud-based machine-intelligence APIs that you can use to extract structured data from complex sources (unstructured text, images, video and audio), and add “AI” type features to applications. A good example is the “Seeing AI” glasses in the video below: the image descriptions, emotion inference, a...

2143 sym

Notable New and Updated R packages (to October 2016)

17.11.2016

As we prepare for the upcoming release of Microsoft R Open, I've been preparing the list of new and updated packages for the spotlights page. This involves scanning the CRANberries feed (with gracious thanks to Dirk Eddelbuettel) for newly-released packages and significant updates to existing ones. This is a lot of data to process. For context, ...

7316 sym

The 5 most popular R packages

18.11.2016

The good folks at DataCamp track activity related to R packages on the RDocumentation.org Trends page. As of this writing, it tracks statistics on 11,768 packages (distributed across CRAN, BioConductor and Github) comprising over 1.7 million R functions in total. On that page, you can find current rankings on the most downloaded R packages, the m...

2283 sym

Tutorial: Build a live rental prediction service with SQL Server R Services

23.11.2016

A great way to learn is by doing, so if you've been thinking about how to enable R-based computations within SQL Server, a new tutorial will take you through all the steps of building an intelligent application. In a few simple steps, you'll set up all the necessary software and code to build a live service that predicts demand for a ski rental s...

1549 sym 2 img

Happy Thanksgiving! (2016)

24.11.2016

It's Thanksgiving day here in the US, so we're taking the rest of the week off to reflect on what we're thankful for. And even if you're not in the US, today is a great day to send thanks to the R Core Group for providing their dedication, time, and expertise to make the R Project what it is today.   (Sadly, cowsay doesn't feature a Thanksgivin...

849 sym 2 img

A heat map of Divvy bike riders in Chicago

28.11.2016

Chicago's a great city for a bike-sharing service. It's pretty flat, and there are lots of wide roads with cycle lanes. I love Divvy and use it all the time. Not so much in the winter though: it gets very cold here. Nonetheless, this heat map of Divvy riders, created in R by Austin Wehrwein, reveals a hardcore set of riders that use the service ...

1127 sym 2 img

Free online course: Analyzing big data with Microsoft R Server

29.11.2016

If you're already familiar with R, but struggling with out-of-memory or performance problems when attempting to analyze large data sets, you might want to check out this new EdX course, Analyzing Big Data with Microsoft R Server, presented by my colleague Seth Mottaghinejad. In the course, you'll learn how to build models using the RevoScaleR pa...

1550 sym