Publications by R on Data & The World
MST3K Episode vs Movie Scores
First broadcast in 1988, Mystery Science Theater 3000 is a television show whose nominal story involves a guy being trapped in space by a couple of mad scientist types…which is actually just an excuse to have a few guys make fun of really, really bad movies. This raises a few unusual questions about the series (as far as TV series go, anyway), ...
4319 sym 8 img
An (Animated) Example of Bayesian Updating
Bayesian statistics is centered on constructing certain assumptions about how the probability of an event is distributed, and then adjusting that belief as new information comes in. It can be more involved to construct a Bayesian model as opposed to the “look at many things in aggregate” approach used in frequentist statistics. But it has nic...
3362 sym 8 img 1 tbl
Looking Normal(ly Distributed)
Among all probability distributions, the normal distribution is probably the most well-established and well-characterized. The importance of things like the central limit theorem and the normality assumptions in linear regression highlight it well. One of the more interesting ones is the fact that you can approximate a binomial distribution with ...
8194 sym R (54 sym/1 pcs) 18 img
538 Dungeons & Dragons Riddler
This problem was the Riddler Classic on 538 for May 15, 2020. The problem is as follows: The fifth edition of Dungeons & Dragons introduced a system of “advantage and disadvantage.” When you roll a die “with advantage,” you roll the die twice and keep the higher result. Rolling “with disadvantage” is similar, except you keep the lowe...
8751 sym R (2430 sym/10 pcs) 2 img 2 tbl
Detecting Streaks in R
Inspired by this post, which tries to calculate streaks in Python’s pandas library, I thought I’d give it a try in R, since it’s all just dataframe operations in the Python post. I won’t repeat his analysis, but I will replicate the streak determination and some of the plots. The data he uses is here. Determining Streaks As outlined in th...
2227 sym R (2324 sym/8 pcs) 4 img
A Slightly More Advanced MCMC Example
I’ve seen a number of examples of MCMC algorithms, and while they’re all solid, a lot of them tend to be a bit too neat – they have a fairly simple model, a single predictor (maybe two), and not much else. This one is a good example, as it covers the theory in detail, but it’s using an obviously toy data set. So I decided to throw togethe...
6292 sym R (2536 sym/10 pcs) 4 img
Formatting With ggtext Example
This is a quick example regarding the ggtext package. It’s one of the many packages that extends ggplot2, with this one having a focus on adding and formatting text in graphs. The particularly interesting thing for me is that it allows Markdown and other formatting of the labels in a graph. Let’s throw together a facet plot for the principal ...
1815 sym R (1229 sym/4 pcs) 4 img
Spotify Cross-Playlist Predictions, Part 1
This is the first of probably two posts detailing the construction of an RShiny app. The app in question is meant to take data from two Spotify playlists, make recommendations for tracks from one – which I’ll call the “target” playlist – based on the contents of another – the “reference” playlist. I don’t expect this to be compa...
8994 sym 12 img
Spotify Cross-Playlist Predictions, Part 2
This is a follow up to the previous post, where the mechanics of making cross-playlist predictions were covered. This post covers the second half of the project: now that we have the analysis method and the important functions worked out in practice, we need to code this functionality into a Shiny app, create a Docker container that holds and run...
6126 sym R (983 sym/3 pcs) 4 img
An Example With accumulate()
As with most useful (collections of) libraries, the tidyverse has a lot to offer. One interesting bit that I found recently was the accumulate() function in the purrr library, which allows you to apply a function over a succession of values in a vector. This post is a quick example of its use, using linear regression models. The documentation giv...
2881 sym R (1630 sym/3 pcs) 2 img