Publications by David Robinson

Gender and verbs across 100,000 stories: a tidy analysis

27.04.2017

Previously in this series Examining the arc of 100,000 stories I was fascinated by my colleague Julia Silge’s recent blog post on what verbs tend to occur after “he” or “she” in several novels, and what they might imply about gender roles within fictional work. This made me wonder what trends could be found across a larger dataset of ...

6444 sym R (2203 sym/5 pcs) 6 img

Slides, videos, and tweets from the 2017 New York R Conference

22.05.2017

In April I attended the 2017 New York R conference, hosted by Lander Analytics and Work-Bench. It was both the third time the conference was held and the third time I’ve attended, and it gets more fun each year, especially because this year eight of us attended from Stack Overflow (including all five of us on the Data Team). Now that the videos...

6040 sym 6 img

Words growing or shrinking in Hacker News titles: a tidy analysis

08.06.2017

In May, some friends and I built Tagger News, a real-time automatic classifier of Hacker News articles based on their text (see here for more about how we built it). This process started me down some interesting paths, particularly analyzing trends in titles. By finding words that became more or less common in Hacker News titles over time, we can...

6282 sym R (4882 sym/11 pcs) 10 img

Two years as a Data Scientist at Stack Overflow

22.06.2017

Last Friday marked my two year anniversary working as a data scientist at Stack Overflow. At the end of my first year I wrote a blog post about my experience, both to share some of what I’d learned and as a form of self-reflection. After another year, I’d like to revisit the topic. While my first post focused mostly on the transition from my ...

10577 sym 2 img

Teach the tidyverse to beginners

05.07.2017

A few years ago, I wrote a post Don’t teach built-in plotting to beginners (teach ggplot2). I argued that ggplot2 was not an advanced approach meant for experts, but rather a suitable introduction to data visualization. Many teachers suggest I’m overestimating their students: “No, see, my students are beginners…”. If I push the point, ...

15236 sym R (1036 sym/4 pcs) 2 img

Trump’s Android and iPhone tweets, one year later

09.08.2017

A year ago today, I wrote up a blog post Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half. My analysis, shown below, concludes that the Android and iPhone tweets are clearly from different people, posting during different times of day and using hashtags, links, and retweets in distinct ways. What’s more, we ...

10919 sym R (5364 sym/12 pcs) 18 img

Don’t teach students the hard way first

21.09.2017

Imagine you were going to a party in an unfamiliar area, and asked the host for directions to their house. It takes you thirty minutes to get there, on a path that takes you on a long winding road with slow traffic. As the party ends, the host tells you “You can take the highway on your way back, it’ll take you only ten minutes. I just wanted...

7219 sym R (488 sym/2 pcs)

Announcing “Introduction to the Tidyverse”, my new DataCamp course

09.11.2017

For the last few years I’ve been encouraging a particular approach to R education, particularly teaching the dplyr and ggplot2 packages first and introducing real datasets early on. This week I’m excited to announce the next step: the release of Introduction to the Tidyverse, my new interactive course on the DataCamp platform. The course is ...

8002 sym 2 img

Advice to aspiring data scientists: start a blog

14.11.2017

Last week I shared a thought on Twitter: When you’ve written the same code 3 times, write a functionWhen you’ve given the same in-person advice 3 times, write a blog post— David Robinson (@drob) November 9, 2017 Ironically, this tweet hints at a piece of advice I’ve given at least 3 dozen times, but haven’t yet written a post about. I�...

11675 sym

What’s the difference between data science, machine learning, and artificial intelligence?

09.01.2018

When I introduce myself as a data scientist, I often get questions like “What’s the difference between that and machine learning?” or “Does that mean you work on artificial intelligence?” I’ve responded enough times that my answer easily qualifies for my “rule of three”: When you’ve written the same code 3 times, write a functio...

10924 sym 2 img