Publications by David Robinson
Analysis of the #7FavPackages hashtag
Twitter has seen a recent trend of “first 7” and “favorite 7” hashtags, like #7FirstJobs and #7FavFilms. Last week I added one to the mix, about my 7 favorite R packages: devtoolsdplyrggplot2knitrRcpprmarkdownshiny#7FavPackages #rstats— David Robinson (@drob) August 16, 2016 Hadley Wickham agreed to share his own, but on one condition: ...
5949 sym R (5553 sym/14 pcs) 12 img
Tidying computational biology models with biobroom: a case study in tidy analysis
Previously in this series: Cleaning and visualizing genomic data: a case study in tidy analysis Modeling gene expression with broom: a case study in tidy analysis In previous posts, I’ve examined the benefits of the tidy data framework in cleaning, visualizing, and modeling in exploratory data analysis on a molecular biology experiment. We’...
9623 sym R (7849 sym/22 pcs) 12 img
Understanding empirical Bayesian hierarchical modeling (using baseball statistics)
Previously in this series: Understanding the beta distribution Understanding empirical Bayes estimation Understanding credible intervals Understanding the Bayesian approach to false discovery rates Understanding Bayesian A/B testing Understanding beta binomial regression Suppose you were a scout hiring a new baseball player, and were choosing b...
12886 sym R (4160 sym/13 pcs) 16 img
The ‘deadly board game’ puzzle: efficient simulation in R
Last Friday’s “The Riddler” column on FiveThirtyEight presents an interesting probabilistic puzzle: While traveling in the Kingdom of Arbitraria, you are accused of a heinous crime. Arbitraria decides who’s guilty or innocent not through a court system, but a board game. It’s played on a simple board: a track with sequential spaces num...
12485 sym R (8457 sym/38 pcs) 8 img
Analysis of software developers in New York, San Francisco, London and Bangalore
(Note: Cross-posted with the Stack Overflow Blog.) When I tell someone Stack Overflow is based in New York City, they’re often surprised: many people assume it’s in San Francisco. (I’ve even seen job applications with “I’m in New York but willing to relocate to San Francisco” in the cover letter.) San Francisco is a safe guess of wher...
5933 sym 10 img
Understanding mixture models and expectation-maximization (using baseball statistics)
Previously in this series: Understanding the beta distribution Understanding empirical Bayes estimation Understanding credible intervals Understanding the Bayesian approach to false discovery rates Understanding Bayesian A/B testing Understanding beta binomial regression Understanding empirical Bayesian hierarchical modeling In this series on e...
14857 sym R (8065 sym/25 pcs) 24 img
Introducing the ebbr package for empirical Bayes estimation (using baseball statistics)
Previously in this series: The beta distribution Empirical Bayes estimation Credible intervals The Bayesian approach to false discovery rates Bayesian A/B testing Beta-binomial regression Understanding empirical Bayesian hierarchical modeling Mixture models and expectation-maximization We’ve introduced a number of statistical techniques in th...
12245 sym R (12725 sym/41 pcs) 20 img
Simulation of empirical Bayesian methods (using baseball statistics)
Previously in this series: The beta distribution Empirical Bayes estimation Credible intervals The Bayesian approach to false discovery rates Bayesian A/B testing Beta-binomial regression Understanding empirical Bayesian hierarchical modeling Mixture models and expectation-maximization The ebbr package We’re approaching the end of this series...
18021 sym R (9998 sym/42 pcs) 30 img
Announcing the release of my e-book: Introduction to Empirical Bayes
I’m excited to announce the release of my new e-book: Introduction to Empirical Bayes: Examples from Baseball Statistics, available here. This book is adapted from a series of ten posts on my blog, starting with Understanding the beta distribution and ending recently with Simulation of empirical Bayesian methods. In these posts I’ve introduce...
3594 sym 2 img
Examining the arc of 100,000 stories: a tidy analysis
I recently came across a great natural language dataset from Mark Riedel: 112,000 plots of stories downloaded from English language Wikipedia. This includes books, movies, TV episodes, video games- anything that has a Plot section on a Wikipedia page. This offers a great opportunity to analyze story structure quantitatively. In this post I’ll d...
7367 sym R (3356 sym/11 pcs) 10 img