Publications by David Smith

Webinar and free e-book on data preparation with R

15.03.2016

Just a quick heads up that Nina Zumel, co-founder and principal consultant at Win-Vector LLC will be presenting a webinar at 10AM Pacific Time on Thursday March 17, Data Preparation Techniques with R. Nina is the co-author of Practical Data Science with R and blogs frequently at the Win-Vector blog (and contributes the occasional guest blog her...

1928 sym 2 img

Creating a March Madness bracket with Machine Learning

18.03.2016

March Madness is upon us here in the US. This annual college basketball competition pits 64 teams in a single-elimination tournament, and the team that goes undefeated for all 6 rounds will be named NCAA Champion. Predicting the winners of the competition, and in particular completing a “bracket” of the teams you predict to make it to the fin...

1791 sym 4 img

R Consortium announces new grants for R projects and working groups

23.03.2016

Five months ago, the R Consortium asked the R Community to propose projects to benefit R users and the R project. Today, the R Consortium announced that it has awarded grants to fund seven of those projects. A unified framework for distributed computing with R An improved database interface A one-day workshop to unite R language developers, i...

1776 sym

Introductions to R and predictive analytics

25.03.2016

If you're new to the concept of predictive models, or just want to review the background on how data scientists learn from past data to predict the future, you may be interested in my talk from the Data Insights Summit, Introduction to Real-Time Predictive Modeling. In the talk above I gave a brief introduction to the R language and mentioned se...

1056 sym

About those weird things in R…

28.03.2016

There's no denying that for a language as popular as R, it has more than its fair share of quirks. If you've ever wondered why, for example, R has a non-standard assignment operator, or that periods are allowed in symbols (and don't signify method calls), or that character data imports as factors (not strings) by default, then this blog post by ...

1114 sym 2 img

Two fun plots with R

01.04.2016

Data visualization with R doesn't always have to be serious. Here are a couple of fun charts created recently by R users. First, here's a minimalist rendition of the characters in The Simpsons, by an anonymous blogger: And from Alex Whan, here's a near-perfect recreation of the classic cover of the Joy Division album Unknown Pleasures, based on ...

881 sym 4 img

Help improve treatment for brain injuries using machine learning and R

04.04.2016

The field of neuroscience — the study of brains and the nervous system — has taken some major leaps in recent years. Scientists can now gather real-time electrical activity from the brain during actions and thoughts, which is helping to pinpoint the exact location of brain lesions caused by strokes, and is leading to promising treatments for ...

2554 sym 2 img

AirbnB uses R to scale data science

05.04.2016

Airbnb, the property-rental marketplace that helps you find a place to stay when you're travelling, uses R to scale data science. Airbnb is a famously data-driven company, and has recently gone through a period of rapid growth. To accommodate the influx of data scientists (80% of whom are proficient in R, and 64% use R as their primary data an...

2554 sym 6 img

In case you missed it: March 2016 roundup

08.04.2016

In case you missed them, here are some articles from February of particular interest to R users.  Reviews of new CRAN packages RtutoR, lavaan.shiny, dCovTS, glmmsr, GLMMRR, MultivariateRandomForest, genie, kmlShape, deepboost and rEDM. You can now create and host Jupyter notebooks based on R, for free, in Azure ML Studio. Calculating learning ...

2892 sym

The FBI’s aerial surveillance program, visualized with R

11.04.2016

Buzzfeed's Peter Aldhous and Charles Seife broke a major news story last week: the US Federal Bureau of Investigation and Department of Homeland Security operate more than 200 small aircraft (mainly Cessnas and some helicopters) which routinely circle various sites near US cities, presumably to gather data with onboard cameras and electonic equi...

2487 sym 4 img