Publications by David Smith

Comparing subreddits, with Latent Semantic Analysis in R

24.03.2017

FiveThirtyEight published a fascinating article this week about the subreddits that provided support to Donald Trump during his campaign, and continue to do so today. Reddit, for those not in the know, is an popular online social community organized into thousands of discussion topics, called subreddits (the names all begin with “r/“). Most ...

2520 sym 4 img

Data science languages score highly in RedMonk rankings

27.03.2017

Redmonk have once again updated (a little later than usual) their bi-annual programming language report with their January 2017 rankings. If you haven't come across these rankings before, they are based on GitHub contributions and StackOverflow questions related to around 40 commonly-used programming languages. The raw data (as of January 2017) i...

1548 sym 2 img

UK government using R to modernize reporting of official statistics

28.03.2017

Like all governments, the UK government is responsible for producing reports of official statistics on an ongoing basis. That process has traditionally been a highly manual one: extract data from government systems, load it into a mainframe statistical analysis tool and run models and forecasts, extract the results to a spreadsheet to prepare da...

3728 sym 8 img

Learning Scrabble strategy from robots, using R

30.03.2017

While you might think of Scrabble as that game you play with your grandparents on a rainy Sunday, some people take it very seriously. There's an international competition devoted to Scrabble, and no end of guides and strategies for competitive play. James Curley, a psychology professor at Columbia University, has used an interesting method to col...

5063 sym 4 img

Tutorial: Using R for Scalable Data Analytics

31.03.2017

At the recent Strata conference in San Jose, several members of the Microsoft Data Science team presented the tutorial Using R for Scalable Data Analytics: Single Machines to Spark Clusters. The materials are all available online, including the presentation slides and hands-on R scripts. You can follow along with the materials at home, using th...

2096 sym 6 img

The Most Popular Languages for Data Scientists/Engineers

03.04.2017

The results of the 2017 StackOverflow Survey of nearly 65,000 developers were published recently, and includes lots of interesting insights about their work, lives and preferences. The results include a cross-tabulation of the most popular languages amongst the “Data Scientist/Engineer” subset, and the results were … well, surprising: Whe...

2537 sym 4 img

Publish R functions as stored procedures with the sqlrutils package

04.04.2017

If you've created an R function (say, a routine to clean up missing values in a data set, or a function to make forecasts using a machine learning model), and you want to make it easy for DBAs to use it, it's now possible to publish R functions as a SQL Server 2016 stored procedure. The sqlrutils package provides tools to convert an existing R f...

3537 sym

Fitting a rational function in R using ordinary least-squares regression

06.04.2017

by Srini Kumar, VP of Product Management and Data Science, LevaData; and Bob Horton, Senior Data Scientist, Microsoft A rational function is defined as the ratio of two functions. The Padé Approximant uses a ratio of polynomials to approximate functions: $$ R(x)= \frac{\sum_{j=0}^m a_j x^j}{1+\sum_{k=1}^n b_k x^k}=\frac{a_0+a_1x+a_2x^2+\cdots+a_...

3616 sym R (1327 sym/5 pcs) 10 img

Microsoft R Open 3.3.3 now available

06.04.2017

Microsoft R Open (MRO), Microsoft's enhanced distribution of open source R, has been upgraded to version 3.3.3, and is now available for download for Windows, Mac, and Linux. This update upgrades the R language engine to R 3.3.3, upgrades the installer, and updates the bundled packages. R 3.3.3 makes just a few minor fixes compared to R 3.3.2 ...

1563 sym

The faces of R, analyzed with R

07.04.2017

Maëlle Salmon recently created a collage of profile pictures of people who use the #rstats hashtag in their Twitter bio to indicate their use of R. (I've included a detail below; click to see the complete version at Maëlle's blog.) Naturally, Maëlle created the collage using R itself. Matching Twitter bios were found using the search_user...

2005 sym 4 img