Publications by David Smith

Free guide to text mining with R

20.01.2017

Jilia Silge and David Robinson are both dab hands at using R to analyze text, from tracking the happiness (or otherwise) of Jane Austen characters, to identifying whether Trump's tweets came from him or a staffer. If you too would like to be able to make statistical sense of masses of (possibly messy) text data, check out their book Tidy Tidy Tex...

2269 sym 4 img

Upcoming R Conferences

23.01.2017

Since a few new events have been announced recently, I thought I'd give a run-down on some major R conferences coming up in the next six months. February 18: satRdays, Cape Town (South Africa). This is the second in a series of one-day conferences inspired by an R Consortium proposal. The first event in Budapest was a great success, and the line...

2692 sym

Building a machine learning model with the MicrosoftML package

24.01.2017

Microsoft R Server 9 includes a new R package for machine learning: MicrosoftML. (So do the Data Science Virtual Machine and the free Microsoft R Client edition, incidentally.) This package includes a suite of fast predictive modeling functions implemented by Microsoft Research, including: Linear (rxFastLinear) and logistic (rxLogisticRegressi...

2597 sym 2 img

New Zealand bank replaces SAS server with R Server

26.01.2017

Heartland Bank, a rapidly growing bank in New Zealand, has adopted a data-driven approach to analyzing risk, evaluating credit lines, and understanding cash flows. But they found their legacy SAS system to be labor-intensive and time consuming when it came to updating financial models, and it was expensive to boot. (Being licensed on a per-user ...

2131 sym

Kung Fu R

26.01.2017

A great way to hone your skills as a data scientist is to pick a topic you're passionate about, find some data related to it, and analyze the heck out of it. Jim Vallandingham is clearly passionate about old Kung Fu movies — particularly those from the Shaw Brothers Studio — and has used R to analyze data the studio's oeuvre: 260 films ov...

1688 sym 2 img

CRAN now has 10,000 R packages. Here’s how to find the ones you need.

27.01.2017

CRAN, the global repository of open-source packages that extend the capabiltiies of R, reached a milestone today. There are now more than 10,000 R packages available for download*. (Incidentally, that count doesn't even include all the R packages out there. There are also another 1294 packages for genomic analysis in the BioConductor repository...

3891 sym 2 img

List of R conferences and user groups (2017-01-30)

30.01.2017

For 8 years now, we've maintained a list of local R user groups here at the Revolutions blog. This is a list that began with a single group (the Bay Area RUG, the first and still one of the largest groups), and now includes 360 user groups worldwide (including 27 specifically for women). As the list has grown in size, it's become harder to manage...

1793 sym

Data Science Virtual Machine updated, now includes RStudio, JuliaPro

31.01.2017

The Windows edition of the Data Science Virtual Machine (DSVM) was recently updated on the Azure Marketplace. This update upgrades some existing components and adds some new ones as well. You now have your choice of integrated development environment to use with R. RStudio Desktop is now included in the Data Science Virtual Machine image — n...

2146 sym 2 img

A look back at the year in R and Microsoft

01.02.2017

Thomas Dinsmore's ML/DL blog recently concluded a look back on significant advancements in data science, machine learning and deep learning — many of which involved R and/or Microsoft. Here are those highlights (reproduced with permission): The R Project R and Python maintained their leadership as primary tools for open data science. The Pytho...

6872 sym

fst: Fast serialization of R data frames

02.02.2017

If you want to get data out of R and into another application or system, simply copying the data as it resides in memory generally isn't an option. Instead you have to serialize the data (into a file, usually), which the other application can then deserialize to recreate the original data. R has several options to serialize data frames: You can...

3021 sym 2 img