Publications by free range statistics - R
House effects, herding, and the last few days before the election by @ellis2013nz
So, we’re down to the last few days before the Australian federal election, the first one that I’ve been tracking polls and making forecasts for. I thought I’d address a couple of points raised on Twitter about my forecasts. My forecasts are generally a bit more sceptical of a clean ALP win (ie 76 or more seats in the House of Representativ...
8771 sym R (2644 sym/3 pcs) 12 img 1 tbl
Polls v results by @ellis2013nz
Like most people, I was surprised at yesterday’s result for the Australian federal election. But I was much less surprised than most people. The two-party-preferred vote for the ALP (currently 49.11%, with a lot of pre-polling still to count) came down at the bottom of my prediction interval but was still inside it; and they look to have a chan...
5238 sym R (1498 sym/1 pcs) 2 img
Too important to leave to the data scientists by @ellis2013nz
I usually write blog posts that include big chunks of R code and deal with analysis of specific datasets that I hope are of interest to people with specialist statistical and data science skills (or hoping to develop those skills). But I happen to think that broader data literacy is even more important than my hobby. This week I released an artic...
2023 sym
Time series forecast cross-validation by @ellis2013nz
Time series cross-validation is important part of the toolkit for good evaluation of forecasting models. forecast::tsCV makes it straightforward to implement, even with different combinations of explanatory regressors in the different candidate models for evaluation. Suprious correlation between time series is a well documented and mocked proble...
9363 sym R (2624 sym/6 pcs) 2 img 2 tbl
Forecasting unemployment by @ellis2013nz
Last week I wrote about time-series cross-validation, and mentioned that my original motivation was forecasting unemployment. Actually, I’m interested in nowcasting unemployment rates – that is, estimating a value for a current or recently past period, before the official statistic becomes available. In particular, I’ve been wondering about...
16922 sym R (14223 sym/5 pcs) 20 img 5 tbl
Inferring a continuous distribution from binned data by @ellis2013nz
Today’s post comes from an idea and some starting code by my colleague David Diviny from Nous Group. A common real-world problem is trying to estimate an unknown continuous variable from data that has been published in lumped-together bins. Often this will have been done for confidentialisation reasons; or it might just be that it has been aggr...
12287 sym R (8755 sym/5 pcs) 12 img 2 tbl
Poisson point processes, mass shootings and clumping by @ellis2013nz
Did the average rate of Australian mass-shooting decline after 1996, or was the drop just chance? I recently came across this letter to the Annals of Internal Medicine by Simon Chapman, Michael Stewart, Philip Alpers and Michael Jones: Fatal Firearm Incidents Before and After Australia’s 1996 National Firearms Agreement Banning Semiautomatic Ri...
14434 sym R (6098 sym/6 pcs) 8 img 1 tbl
Re-creating survey microdata from marginal totals by @ellis2013nz
I recently did some pro bono work for Gun Control NZ reviewing the analysis by a market research firm of the survey that led to this media release: “Most New Zealanders back stronger gun laws”. The analysis all checked out ok. The task at that time was to make sure that any claims about different perceptions of different groups in New Zealand...
20724 sym R (21972 sym/11 pcs) 4 img 3 tbl
A small simple random sample will often be better than a huge not-so-random one by @ellis2013nz
An interesting big data thought experiment The other day on Twitter I saw someone referencing a paper or a seminar or something that was reported to examine the following situation: if you have an urn with a million balls in it of two colours (say red and white) and you want to estimate the proportion of balls that are red, are you better off tak...
7733 sym R (6404 sym/6 pcs) 6 img
Cost-benefit analysis in R by @ellis2013nz
Even organisations and people that use a statistical package for hard-core data and econometric analysis might use spreadsheets for financial and economic scenario and evaluation models. There’s something about the ad hoc nature of financial models in particular that seems to tempt people towards Excel not only for prototyping but even for end ...
11802 sym R (9174 sym/7 pcs) 12 img