Publications by free range statistics - R

Analysing large data on your laptop with a database and R by @ellis2013nz

21.12.2019

The New York City Taxi & Limousine Commission’s open data of taxi trip records is rightly a go-to test piece for analytical methods for largish data. Check out, for example: The benchmark of interesting analysis is set by Todd W Schneider’s well-written and rightly famous blog post ‘Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Ven...

19512 sym R (10469 sym/5 pcs) 20 img

Analysing the effectiveness of tennis tournament seeding by @ellis2013nz

25.01.2020

So, an exploration of how tennis tournament seeding and bracketing impacts on the end result has percolated to the top of my to-do list, inspired by the Melbourne Open currently in play. This is the first of what will probably be two posts on this topic. Today I wanted to look at the impact of seeding on the chance of the best players finishing f...

7401 sym R (10971 sym/7 pcs) 6 img 1 tbl

Lewis Carroll’s proposed rules for tennis tournaments by @ellis2013nz

31.01.2020

Last week I wrote about the impact of seeding the draw in a tennis tournament. Seeding is one way to increase the chance of the top players making it to the final rounds of a single elimination tournament, leading to fairer outcomes and to a higher chance of the best matchups happening in the finals. Basically it’s to overcome this problem: �...

11540 sym R (16125 sym/1 pcs) 10 img

Body Mass Index by @ellis2013nz

22.02.2020

BMI has an expectations management problem Body Mass Index (BMI) is an attempt to give a quick back-of-envelope answer to the question “if someone weighs W kg, is that a lot or not very much?” Clearly the answer to that question has to take into account at a minimum the person’s height; in general, whatever may constitute a healthy weight, ...

14298 sym R (8795 sym/3 pcs) 6 img

Log transform or log link? And confounding variables. by @ellis2013nz

29.02.2020

Last week I wrote about the relationship between weight and height in US adults, as seen in the US Centers for Disease Control and prevention (CDC) Behavioral Risk Factor Surveillance System, an annual telephone survey of around 400,000 interviews per year. In particular, I tested the widely-circulated claim that Body Mass Index (BMI) exaggerates...

9858 sym R (5962 sym/3 pcs) 8 img 1 tbl

New Zealand Election Study webtool by @ellis2013nz

06.03.2020

I’ve just finished updating and deploying a webtool that helps explore data from the New Zealand Election Study. I first built a version of this a few years back with just the 2014 wave of the study; today I’ve added the data from the time of the 2017 election and made a number of small improvements (eg getting the macrons back in to the ‘M...

4181 sym 18 img

COVID-19 cumulative observed case fatality rate over time by @ellis2013nz

16.03.2020

Preamble I was slightly reluctant to add to the deluge of charts about the COVID-19 outbreak, but on the other hand making charts is one of the ways I relax and try to understand what’s going on around me. So first, to get out of the way my only advice at this point: wash hands frequently, for 20 seconds at a time, with plenty of soap work at ...

3849 sym R (3876 sym/1 pcs) 4 img

Impact of a country’s age breakdown on COVID-19 case fatality rate by @ellis2013nz

20.03.2020

Italy is routinely and correctly described as particularly vulnerable to COVID-19 because of its older age profile. I set out to understand for myself how important this factor is. What would happen if the case fatality rates observed in Italy were applied to demographic profiles of other countries? Fatality rates by age and sex so far in Italy T...

4997 sym R (5712 sym/1 pcs) 8 img

How to make that crazy Fox News y axis chart with ggplot2 and scales by @ellis2013nz

05.04.2020

Possibly you have seen this graphic circulating on social media. At first it doesn’t look too remarkable, but then you notice the vertical axis. Oh. The gridlines are equally spaced on the page, but sometimes the same space represents 30 people, sometimes 10, and sometimes 50. It isn’t even strictly increasing – which might be expected if s...

4434 sym R (2754 sym/4 pcs) 8 img

Pragmatic prediction intervals from a quasi-likelihood GLM by @ellis2013nz

17.04.2020

Today’s blog comes with two lessons: a statistical one, and one on troubleshooting. Rubber-duck debugging Troubleshooting lesson first. The problem described below has been causing me grief for several days. I have an important and time-critical use case at work where I need to impute lots of complex missing data using the sort of model I’m a...

9234 sym R (1841 sym/4 pcs) 4 img