Publications by David Robinson

Analysis of the #7FavPackages hashtag

26.08.2016

Twitter has seen a recent trend of “first 7” and “favorite 7” hashtags, like #7FirstJobs and #7FavFilms. Last week I added one to the mix, about my 7 favorite R packages: devtoolsdplyrggplot2knitrRcpprmarkdownshiny#7FavPackages #rstats— David Robinson (@drob) August 16, 2016 Hadley Wickham agreed to share his own, but on one condition: ...

5949 sym R (5553 sym/14 pcs) 12 img

Tidying computational biology models with biobroom: a case study in tidy analysis

06.09.2016

Previously in this series: Cleaning and visualizing genomic data: a case study in tidy analysis Modeling gene expression with broom: a case study in tidy analysis In previous posts, I’ve examined the benefits of the tidy data framework in cleaning, visualizing, and modeling in exploratory data analysis on a molecular biology experiment. We’...

9623 sym R (7849 sym/22 pcs) 12 img

Understanding empirical Bayesian hierarchical modeling (using baseball statistics)

11.10.2016

Previously in this series: Understanding the beta distribution Understanding empirical Bayes estimation Understanding credible intervals Understanding the Bayesian approach to false discovery rates Understanding Bayesian A/B testing Understanding beta binomial regression Suppose you were a scout hiring a new baseball player, and were choosing b...

12886 sym R (4160 sym/13 pcs) 16 img

The ‘deadly board game’ puzzle: efficient simulation in R

19.10.2016

Last Friday’s “The Riddler” column on FiveThirtyEight presents an interesting probabilistic puzzle: While traveling in the Kingdom of Arbitraria, you are accused of a heinous crime. Arbitraria decides who’s guilty or innocent not through a court system, but a board game. It’s played on a simple board: a track with sequential spaces num...

12485 sym R (8457 sym/38 pcs) 8 img

Analysis of software developers in New York, San Francisco, London and Bangalore

01.12.2016

(Note: Cross-posted with the Stack Overflow Blog.) When I tell someone Stack Overflow is based in New York City, they’re often surprised: many people assume it’s in San Francisco. (I’ve even seen job applications with “I’m in New York but willing to relocate to San Francisco” in the cover letter.) San Francisco is a safe guess of wher...

5933 sym 10 img

Understanding mixture models and expectation-maximization (using baseball statistics)

02.01.2017

Previously in this series: Understanding the beta distribution Understanding empirical Bayes estimation Understanding credible intervals Understanding the Bayesian approach to false discovery rates Understanding Bayesian A/B testing Understanding beta binomial regression Understanding empirical Bayesian hierarchical modeling In this series on e...

14857 sym R (8065 sym/25 pcs) 24 img

Introducing the ebbr package for empirical Bayes estimation (using baseball statistics)

05.01.2017

Previously in this series: The beta distribution Empirical Bayes estimation Credible intervals The Bayesian approach to false discovery rates Bayesian A/B testing Beta-binomial regression Understanding empirical Bayesian hierarchical modeling Mixture models and expectation-maximization We’ve introduced a number of statistical techniques in th...

12245 sym R (12725 sym/41 pcs) 20 img

Simulation of empirical Bayesian methods (using baseball statistics)

11.01.2017

Previously in this series: The beta distribution Empirical Bayes estimation Credible intervals The Bayesian approach to false discovery rates Bayesian A/B testing Beta-binomial regression Understanding empirical Bayesian hierarchical modeling Mixture models and expectation-maximization The ebbr package We’re approaching the end of this series...

18021 sym R (9998 sym/42 pcs) 30 img

Announcing the release of my e-book: Introduction to Empirical Bayes

07.02.2017

I’m excited to announce the release of my new e-book: Introduction to Empirical Bayes: Examples from Baseball Statistics, available here. This book is adapted from a series of ten posts on my blog, starting with Understanding the beta distribution and ending recently with Simulation of empirical Bayesian methods. In these posts I’ve introduce...

3594 sym 2 img

Examining the arc of 100,000 stories: a tidy analysis

26.04.2017

I recently came across a great natural language dataset from Mark Riedel: 112,000 plots of stories downloaded from English language Wikipedia. This includes books, movies, TV episodes, video games- anything that has a Plot section on a Wikipedia page. This offers a great opportunity to analyze story structure quantitatively. In this post I’ll d...

7367 sym R (3356 sym/11 pcs) 10 img