Publications by Theory meets practice...

Surveillance Out of the Box – The #Zombie Experiment

24.09.2016

Abstract We perform a social experiment to investigate, if zombie related twitter posts can used as a reliable indicator for an early warning system. We show how such a system can be set up almost out-of-the-box using R – a free software environment for statistical computing and graphics. Warning: This blog entry contains toxic doses of Danish ...

9731 sym R (4450 sym/5 pcs) 18 img

Cartograms with R

09.10.2016

Abstract We show how to create cartograms with R by illustrating the population and age-distribution of the planning regions of Berlin by static plots and animations. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The markdown+Rknitr source code of this blog is available under a GNU General Pub...

5029 sym R (2862 sym/5 pcs) 12 img

Better Confidence Intervals for Quantiles

22.10.2016

\[ \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} \] Abstract We discuss the computation of confidence intervals for the median or any other quantile in R. In particular we are interested in the interpolated order statistic approach suggested by Hettmansperger and Sh...

14784 sym R (3863 sym/10 pcs) 8 img 1 tbl

4×3 R-Hackathoning – The Finisher’s Guide

11.12.2016

Abstract We present experiences from organizing a small R hackathon aimed at advancing knowledge and documentation of the R package surveillance. The hackathon was piggybacked on the ESCAIDE2016 conference visited by current and potential package users in the area of infectious disease epidemiology. The output of the hackathon is available at htt...

9985 sym R (70 sym/1 pcs) 6 img

suRprise! – Classifying Kinder Eggs by Boosting

22.12.2016

Abstract Carrying the Danish tradition of Juleforsøg to the realm of statistics, we use R to classify the figure content of Kinder Eggs using boosted regression trees for the egg’s weight and possible rattling noises. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The markdown+Rknitr source ...

6751 sym R (2932 sym/4 pcs) 16 img

Naming Uncertainty by the Bootstrap

05.02.2017

Abstract Data on the names of all newborn babies in Berlin 2016 are used to illustrate how a scientific treatment of chance could enhance rank statements in, e.g., onomastics investigations. For this purpose, we first identify different stages of the naming-your-baby process, which are influenced by chance. Second, we compute confidence intervals...

11304 sym R (2835 sym/5 pcs) 16 img 1 tbl

Happy pbirthday class of 2016

12.02.2017

Abstract Continuing the analysis of first names given to newborns in Berlin 2016, we solve the following problem: what is the probability, that in a school class of size \(n\) of these kids there will be at least two kids having the same first name? The answer to the problem for classes of size 26 is 34% and can be solved as an instance of the bi...

8206 sym R (1930 sym/8 pcs) 12 img

US Babyname Collisions 1880-2014

28.02.2017

Abstract We use US Social Security Administration data to compute the probability of a name clash in a class of year-YYYY born kids during the years 1880-2014. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The markdown+Rknitr source code of this blog is available under a GNU General Public Lic...

2535 sym R (247 sym/3 pcs) 12 img

Did Mary and John go West?

05.03.2017

Abstract As a final post in the baby-names-the-data-scientist’s-way series, we use the US Social Security Administration 1910-2015 data to space-time visualize for each the most popular baby name for girls and boys, respectively. The code uses in parts the new simple features package (sf) in order to to get some first experience with the packag...

3384 sym R (1541 sym/4 pcs) 10 img

On a First Name Basis with Statistics Sweden

24.03.2017

Abstract Jugding from recent R-Bloggers posts, it appears that many data scientists are concerned with scraping data from various media sources (Wikipedia, twitter, etc.). However, one should be aware that well structured and high quality datasets are available through state’s and country’s bureau of statistics. Increasingly these are offered...

4897 sym R (1432 sym/7 pcs) 8 img