Publications by kjytay

Estimating pi using the method of moments

14.03.2021

Happy Pi Day! I don’t encounter very much in my area of statistics, so this post might seem a little forced… In this post, I’m going to show one way to estimate . The starting point is the integral identity There are two ways to see why this identity is true. The first is that the integral is simply computing the area of a quarter-circle ...

1196 sym R (486 sym/2 pcs) 22 img

Is the EPL getting more unequal?

11.04.2021

I recently heard that Manchester City were so far ahead in the English Premier League (EPL) that the race for first was basically over, even though they were still about 6-7 more games to go (out of a total of 38 games). At the other end of the table, I heard that Sheffield United were so far behind that they are all but certain to be relegated. ...

4902 sym R (2262 sym/6 pcs) 28 img

What is the Tukey loss function?

23.04.2021

The Tukey loss function The Tukey loss function, also known as Tukey’s biweight function, is a loss function that is used in robust statistics. Tukey’s loss is similar to Huber loss in that it demonstrates quadratic behavior near the origin. However, it is even more insensitive to outliers because the loss incurred by large residuals is const...

2303 sym R (1131 sym/3 pcs) 22 img

Introducing cvwrapr for your cross-validation needs

25.05.2021

TLDR: I’ve written an R package, cvwrapr, that helps users to cross-validate hyperparameters. The code base is largely extracted from the glmnet package. The R package is available for download from Github, and contains two vignettes which demonstrate how to use it. Comments, feedback and bug reports welcome! Imagine yourself in the following s...

5241 sym R (2114 sym/8 pcs) 8 img

Small gotcha when using negative indexing

26.05.2021

Negative indexing is a commonly used method in R to drop elements from a vector or rows/columns from a matrix that the user does not want. For example, the code below drops the third column from the matrix M: M <- matrix(1:9, nrow = 3) M # [,1] [,2] [,3] # [1,] 1 4 7 # [2,] 2 5 8 # [3,] 3 6 9 M[, -3] # ...

1472 sym R (919 sym/5 pcs)

NBA playoffs: Visualizing win percentage by seeding

01.06.2021

Background With the NBA playoffs going on, I’ve been thinking about the following question: A and B are about to play a game. We know that among all players, A has rank/seed and B has rank/seed . (A higher ranking/seeding corresponds to a smaller value of , with being the best. Players with higher rank/seed are better players.) Knowing only t...

6219 sym R (4114 sym/6 pcs) 24 img

Estimating win probability from best-of-7 series is not straightforward

07.06.2021

(Note: The code in this post is available here as a single R script.) Let’s say A and B play 7 games and A wins 4 of them. What would be a reasonable estimate for A’s win probability (over B)? A natural estimate would be 4/7. More generally, if A wins games, then the estimator is an unbiased estimator of the true win probability . This is n...

4025 sym R (3281 sym/7 pcs) 60 img

Documentation for internal functions

10.06.2021

tl;dr: To avoid triple quotes and R CMD CHECK --as-cran errors due to documentation examples for internal functions, enclose the example code in \dontrun{}. I recently encountered an issue when submitting an R package to CRAN that I couldn’t find a clean answer for. One of the comments from the manual check was the following: Using foo:::f inst...

2935 sym R (396 sym/2 pcs) 6 img

Using different fonts with ggplot2

08.07.2021

I was recently asked to convert all the fonts in my ggplot2-generated figures for a paper to Times New Roman. It turns out that this is easy, but it brought up a whole host of questions that I don’t have the full answer to. If you want to go all out with using custom fonts, I suggest looking into the extrafont and showtext packages. This post w...

2431 sym R (1037 sym/4 pcs) 6 img

Getting predictions from an isotonic regression model

29.07.2021

TLDR: Pass the output of the isoreg function to as.stepfun to make an isotonic regression model into a black box object that takes in uncalibrated predictions and outputs calibrated ones. Isotonic regression is a method for obtaining a monotonic fit for 1-dimensional data. Let’s say we have data such that . (We assume no ties among the ‘s fo...

2772 sym R (980 sym/6 pcs) 20 img