Publications by Theory meets practice...
Estimating the Size of a Demonstration
Abstract Inspired by the recent March For Science we look into methods for the statistical estimation of the number of people participating in a demonstration organized as a march. In particular, we provide R code to reproduce the two on-the-spot counting method analysis of Yip et al. (2010) for the data of the July 1 March in Hong Kong 2006. ...
9206 sym R (921 sym/3 pcs) 12 img 1 tbl
Beware the Argument: The Flint Water Crisis and Quantiles
Abstract If your tap water suddenly becomes brown while authorities claim everything is okay, you start to worry. Langkjær-Bain (2017) tells the Flint Water Crisis story from a statistical viewpoint: essentially the interest is in whether the 90th percentile in a sample of lead concentration measurements is above a certain threshold or not. We i...
13332 sym R (3200 sym/7 pcs) 14 img
Confidence Intervals without Your Collaborator’s Tears
Abstract We provide an interpretation for the confidence interval for a binomial proportion hidden as the transcript of an hypothetical statistical consulting session. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The markdown+Rknitr source code of this blog is available under a GNU General Pu...
8529 sym R (2092 sym/6 pcs) 10 img
Pair Programming Statistical Analyses
Abstract Control calculation ping-pong is the process of iteratively improving a statistical analysis by comparing results from two independent analysis approaches until agreement. We use the daff package to simplify the comparison of the two results and illustrate its use by a case study with two statisticians ping-ponging an analysis using dply...
7751 sym R (5554 sym/12 pcs) 8 img 1 tbl
Inkognito – Sequential Bayesian Identity Disclosure
Abstract We provide Bayesian decision support for revealing the identity of opponents in the board game Inkognito. This includes the use of combinatorics to deduce the likelihood of observing a particular configuration and a sequential Bayesian belief updating scheme to infer opponent’s identity. From a R point of view we use base R where it’...
10610 sym R (4164 sym/7 pcs) 8 img
Safe Disposal of Unexploded WWII Bombs
Abstract Unexploded WWII bombs are ticking threats despite being dropped more than 70 years ago. In this post we explain how statistical methods are used to plan the search and disposal of unexploded WWII bombs. In particular we consider and exemplify the non-parametric nearest neighbour distance (NND) method implemented in the R package highrisk...
15040 sym R (2153 sym/7 pcs) 10 img
Safe Disposal of Unexploded WWII Bombs
Abstract Unexploded WWII bombs are ticking threats despite being dropped more than 70 years ago. In this post we explain how statistical methods are used to plan the search and disposal of unexploded WWII bombs. In particular we consider and exemplify the non-parametric nearest neighbour distance (NND) method implemented in the R package highrisk...
15058 sym R (2156 sym/7 pcs) 10 img
Retracing Prenatal Testing Algorithms
Abstract A good understanding of the statistical procedure used to calculate trisomy 21 (Down syndrome) risk in combined first trimester screening is a precondition for taking an informed decision on how to proceed with the screening results. For this purpose we implement the Fetal Medicine Foundation (FMF) Germany procedure described in Merz et ...
20838 sym R (1132 sym/3 pcs) 8 img 2 tbl
Factfulness: Building Gapminder Income Mountains
Abstract: We work out the math behind the so called income mountain plots used in the book “Factfulness” by Hans Rosling and use these insight to generate such plots using tidyverse code. The trip includes a mixture of log-normals, the density transformation theorem, histogram vs. density and then skipping all those details again to make nice...
11452 sym R (5799 sym/6 pcs) 16 img
World Income, Inequality and Murder
Abstract: We follow up on last weeks post on using Gapminder data to study the world’s income distribution. In order to assess the inequality of the distribution we compute the Gini coefficient for the world’s income distribution by Monte Carlo approximation and visualize the result as a time series. Furthermore, we animate the association be...
7794 sym R (1613 sym/2 pcs) 14 img