Publications by David Robinson
Exploring handwritten digit classification: a tidy analysis of the MNIST dataset
In a recent post, I offered a definition of the distinction between data science and machine learning: that data science is focused on extracting insights, while machine learning is interested in making predictions. I also noted that the two fields greatly overlap: I use both machine learning and data science in my work: I might fit a model on S...
10038 sym R (5208 sym/15 pcs) 14 img
What digits should you bet on in Super Bowl squares?
My new office introduced me to a betting game I wasn’t previously familiar with: Super Bowl squares. It’s played with a ten-by-ten grid, like this one from printyourbrackets.com: Each row and column gets an assortment of digits from 0-9 representing each team’s score, and each person playing the game (after putting in some money) adds thei...
4910 sym R (2918 sym/7 pcs) 8 img
Data science at DataCamp
In January, I was excited to make an announcement about a shift in my career: I have some exciting news: today I'm joining @DataCamp as their Chief Data Scientist ??? pic.twitter.com/wiN9J4qSjx— David Robinson (@drob) January 29, 2018 When I first discussed the role with the DataCamp CEO, I described my goal as to “Make DataCamp as good at do...
12011 sym 8 img
Scientific debt
A very useful concept in software engineering is technical debt. Technical debt occurs when engineers choose a quick but suboptimal solution to a problem, or don’t spend time to build sustainable infrastructure. Maybe they’re using an approach that doesn’t scale well as the team and codebase expand (such as hardcoding “magic numbers”), ...
10834 sym 6 img
Who wrote the anti-Trump New York Times op-ed? Using tidytext to find document similarity
Like a lot of people, I was intrigued by “I Am Part of the Resistance Inside the Trump Administration”, an anonymous New York Times op-ed written by a “senior official in the Trump administration”. And like many data scientists, I was curious about what role text mining could play. Ok NLP people, now’s your chance to shine. Just spitbal...
11177 sym R (6405 sym/13 pcs) 10 img
Exploring college major and income: a live data analysis in R
I recently came up with the idea for a series of screencasts: I've thought about recording a screencast of an example data analysis in #rstats. I'd do it on a dataset I'm unfamiliar with so that I can show and narrate my live thought process.Any suggestions for interesting datasets to use?— David Robinson (@drob) October 6, 2018 Hadley Wickham ...
3527 sym 6 img
The ‘knight on an infinite chessboard’ puzzle: efficient simulation in R
Previously in this series: The “lost boarding pass” puzzle The “deadly board game” puzzle I’ve recently been enjoying The Riddler: Fantastic Puzzles from FiveThirtyEight, a wonderful book from 538’s Oliver Roeder. Many of the probability puzzles can be productively solved through Monte Carlo simulations in R. Here’s one that caugh...
5593 sym R (3488 sym/19 pcs) 8 img
The ‘largest stock profit or loss’ puzzle: efficient computation in R
Previously in this series: The “knight on an infinite chessboard” puzzle The “lost boarding pass” puzzle The “deadly board game” puzzle I recently came across an interview problem from A Cool SQL Problem: Avoiding For-Loops . Avoiding loops is a topic I always enjoy reading about, and the blog post didn’t disappoint. I’ll quote ...
7196 sym R (4946 sym/18 pcs) 4 img
The birthday paradox puzzle: tidy simulation in R
Previously in this series: The “lost boarding pass” puzzle The “deadly board game” puzzle The “knight on an infinite chessboard” puzzle The “largest stock profit or loss” puzzle The birthday problem is a classic probability puzzle, stated something like this. A room has n people, and each has an equal chance of being born on an...
5631 sym R (5301 sym/27 pcs) 6 img
The ‘Spelling Bee Honeycomb’ puzzle: efficient computation in R
Previously in this series: The “lost boarding pass” puzzle The “deadly board game” puzzle The “knight on an infinite chessboard” puzzle The “largest stock profit or loss” puzzle The “birthday paradox” puzzle I love 538’s Riddler column, and the January 3 puzzle is a fun one. I’ll quote: The New York Times recently launc...
7987 sym R (5930 sym/19 pcs) 6 img