Publications by kjytay
What is the Atkinson index?
What is the Atkinson index? The Atkinson index, introduced by Atkinson (1970) (Reference 1), is a measure of inequality used in economics. Given a population with values and an inequality-aversion parameter , the Atkinson index is defined as If we denote the Hölder mean by then the Atkinson index is simply While the index is defined for all ...
3537 sym R (1868 sym/4 pcs) 73 img
Verifying a stat from The Athletic NBA Show
A few weeks ago, I was listening to The Athletic NBA Show podcast (Episode 581: “5 Players I was wrong about, 20 Games in Contenders, and Sam Vecenie on the 2021 Rookie Class”) and the following statistic caught my attention: Question: Since the 2000-2001 NBA finals, there have been 42 teams in the NBA finals. (This episode aired in Dec 2021,...
4496 sym R (5675 sym/6 pcs) 8 img
Simulating dice bingo
Note: This post was inspired by the “Classroom Bingo” probability puzzle in the Royal Statistical Society’s Significance magazine (Dec 2021 edition). Set-up Imagine that we are playing bingo, but where the numbers are generated by the roll of two 6-sided dice with faces 1, 2, …, 6. Each round, the two dice are rolled. If the sum of the tw...
4208 sym R (2313 sym/5 pcs) 46 img
Playing Wordle in R
The game Wordle has taken the world (or at least my facebook feed) by storm. It’s a really simple word game that’s a lot like the classic Mastermind. Here are the rules from the Wordle website: The logic behind the game is pretty simple, so I thought I’d code up an R version so that those of you who can’t get enough of it can play it on ...
4118 sym R (2974 sym/4 pcs) 3 img
What is the Bradley-Terry model?
The Bradley-Terry model The Bradley-Terry model, named after R. A. Bradley and M. E. Terry, is a probability model for predicting the outcome of a paired comparison. Imagine that we have teams competing against each other. The model assigns team a score , with higher scores corresponding to better teams. Given two teams and , the model asserts...
6950 sym R (7673 sym/15 pcs) 46 img
Comparing the Bradley Terry model to betting odds
In this previous post, I described the Bradley-Terry model and showed how we could use it to predict game outcomes in the NBA 2018-19 regular season. After ffitting the Bradley-Terry model on the first half of the regular season (with and without home advantage), I used the model to predict win probabilities for the second half of the season. The...
3432 sym R (4375 sym/10 pcs) 4 img
Switching testthat editions and how it affects testing functions and formulas
testthat is a popular R package used for unit testing. From v3.0.0, testthat introduces the idea of “editions”. This is testthat‘s way of maintaining backward compatibility. At the time of writing, the 3rd edition is the latest and incorporates the package developer’s latest recommendations, some of which could be backward incompatible. I...
3050 sym R (2131 sym/8 pcs)
How to include all levels of a factor variable in a model matrix in R
In R, the model.matrix function is used to create the design matrix for regression. In particular, it is used to expand factor variables into dummy variables (also known as “one-hot encoding“). Let’s see this in action on the iris dataset: data(iris) str(iris) # 'data.frame': 150 obs. of 5 variables: # $ Sepal.Length: num 5.1 4.9 4.7 ...
2017 sym R (3010 sym/5 pcs)
Changing the column names for model.matrix output
In this previous post, I showed how you can include a dummy variable for the baseline level in the output of the model.matrix function. In this post, I show how you can make changes to the column names of model.matrix‘s output to make downstream parsing a little easier. Let’s use the iris dataset again: data(iris) str(iris) # 'data.frame': ...
1834 sym R (2823 sym/5 pcs)
Something to note when using the merge function in R
Base R has a merge function which does join operations on data frames. As the documentation says, the function [merges] two data frames by common columns or row names, or do other versions of database join operations. One thing that I realized which may not be obvious is that merge can have somewhat unexpected behavior regarding the ordering of ...
2012 sym R (3043 sym/5 pcs)