Publications by free range statistics - R

House effects, herding, and the last few days before the election by @ellis2013nz

14.05.2019

So, we’re down to the last few days before the Australian federal election, the first one that I’ve been tracking polls and making forecasts for. I thought I’d address a couple of points raised on Twitter about my forecasts. My forecasts are generally a bit more sceptical of a clean ALP win (ie 76 or more seats in the House of Representativ...

8771 sym R (2644 sym/3 pcs) 12 img 1 tbl

Polls v results by @ellis2013nz

18.05.2019

Like most people, I was surprised at yesterday’s result for the Australian federal election. But I was much less surprised than most people. The two-party-preferred vote for the ALP (currently 49.11%, with a lot of pre-polling still to count) came down at the bottom of my prediction interval but was still inside it; and they look to have a chan...

5238 sym R (1498 sym/1 pcs) 2 img

Too important to leave to the data scientists by @ellis2013nz

27.06.2019

I usually write blog posts that include big chunks of R code and deal with analysis of specific datasets that I hope are of interest to people with specialist statistical and data science skills (or hoping to develop those skills). But I happen to think that broader data literacy is even more important than my hobby. This week I released an artic...

2023 sym

Time series forecast cross-validation by @ellis2013nz

19.07.2019

Time series cross-validation is important part of the toolkit for good evaluation of forecasting models. forecast::tsCV makes it straightforward to implement, even with different combinations of explanatory regressors in the different candidate models for evaluation. Suprious correlation between time series is a well documented and mocked proble...

9363 sym R (2624 sym/6 pcs) 2 img 2 tbl

Forecasting unemployment by @ellis2013nz

27.07.2019

Last week I wrote about time-series cross-validation, and mentioned that my original motivation was forecasting unemployment. Actually, I’m interested in nowcasting unemployment rates – that is, estimating a value for a current or recently past period, before the official statistic becomes available. In particular, I’ve been wondering about...

16922 sym R (14223 sym/5 pcs) 20 img 5 tbl

Inferring a continuous distribution from binned data by @ellis2013nz

24.08.2019

Today’s post comes from an idea and some starting code by my colleague David Diviny from Nous Group. A common real-world problem is trying to estimate an unknown continuous variable from data that has been published in lumped-together bins. Often this will have been done for confidentialisation reasons; or it might just be that it has been aggr...

12287 sym R (8755 sym/5 pcs) 12 img 2 tbl

Poisson point processes, mass shootings and clumping by @ellis2013nz

06.09.2019

Did the average rate of Australian mass-shooting decline after 1996, or was the drop just chance? I recently came across this letter to the Annals of Internal Medicine by Simon Chapman, Michael Stewart, Philip Alpers and Michael Jones: Fatal Firearm Incidents Before and After Australia’s 1996 National Firearms Agreement Banning Semiautomatic Ri...

14434 sym R (6098 sym/6 pcs) 8 img 1 tbl

Re-creating survey microdata from marginal totals by @ellis2013nz

02.11.2019

I recently did some pro bono work for Gun Control NZ reviewing the analysis by a market research firm of the survey that led to this media release: “Most New Zealanders back stronger gun laws”. The analysis all checked out ok. The task at that time was to make sure that any claims about different perceptions of different groups in New Zealand...

20724 sym R (21972 sym/11 pcs) 4 img 3 tbl

A small simple random sample will often be better than a huge not-so-random one by @ellis2013nz

08.11.2019

An interesting big data thought experiment The other day on Twitter I saw someone referencing a paper or a seminar or something that was reported to examine the following situation: if you have an urn with a million balls in it of two colours (say red and white) and you want to estimate the proportion of balls that are red, are you better off tak...

7733 sym R (6404 sym/6 pcs) 6 img

Cost-benefit analysis in R by @ellis2013nz

23.11.2019

Even organisations and people that use a statistical package for hard-core data and econometric analysis might use spreadsheets for financial and economic scenario and evaluation models. There’s something about the ad hoc nature of financial models in particular that seems to tempt people towards Excel not only for prototyping but even for end ...

11802 sym R (9174 sym/7 pcs) 12 img