Publications by Peter's stats stuff - R

Analysing the Modelled Territorial Authority GDP estimates for New Zealand


At the conference of the New Zealand Association of Economists (NZAE) in late June 2016 I gave a paper on Modelled Territorial Authority Gross Domestic Product, a new dataset my team developed last year in my day job. See the official website of the MTAGDP project for definitive information. I briefly blogged about the data when it was first re...

4649 sym R (523 sym/1 pcs) 16 img

Why you need version control


I recently had an email exchange with a seasoned, well respected analytical professional which included the following (from them, not me): “… my versioning is to have multiple versions of files and to use naming conventions… it works really well.” This is a very smart, competent researcher who has delivered great results, doing innovative...

11148 sym

New Zealand Election Study individual level data


Individual level data is essential to understand voting behaviour My previous analysis has occasionally come up against the problem “only individual level data could resolve that,”. Since I last wrote that, the New Zealand Election Study data for the 2014 General Election have become available, and this post is my first glance at it. The Ne...

8017 sym R (6753 sym/7 pcs) 4 img

Update of `ggseas` for seasonal decomposition on the fly


What’s new A new version (0.5.1) of my ggseas R package is now available on CRAN. ggseas is a small package that provides several tools to make it easier to do seasonal adjustment or decomposition of time series on the fly in a ggplot2 pipeline. New in this version: A facet.titles argument to the decomposition graphic function ggsdc Addition...

3452 sym R (1775 sym/4 pcs) 8 img

Statistics New Zealand experimental API initiative


Exciting experimental API to access New Zealand official statistics Statistics New Zealand have released an exciting experiment in accessing data in JSON format over the web via an application programming interface (API). It looks to be time series data that is usually provided over the solid but dated Infoshare interface, which has only clunky ...

4927 sym R (2153 sym/1 pcs) 2 img

Tourism forecasting competition data in the Tcomp R package


Tourism competition data The tourism forecasting competition described in Athanasopoulos et al (2011) was an important investigation into domain-specific time-series forecasting; a different approach from the broader-scope “M” series forecasting competitions which covered multiple areas. Tourism is important to me in my day job and the artic...

5698 sym R (4143 sym/5 pcs) 10 img

FiveThirtyEight’s polling data for the US Presidential election


3,000+ voting intention surveys Like many others around the world, I have been watching with interest the democratic process in the United States of America. One of the most influential and watched websites is, headed by Nate Silver, author of the excellent popularisation of the craft of statistical time series forecasting Th...

7293 sym R (5665 sym/5 pcs) 14 img

Timeseries forecasting using extreme gradient boosting


In the last few years there have been more attempts at a fresh approach to statistical timeseries forecasting using the increasingly accessible tools of machine learning. This means methods like neural networks and extreme gradient boosting, as supplements or even replacements of the more traditional tools like auto-regressive integrated moving ...

6099 sym R (4921 sym/2 pcs) 2 img

Extreme pie chart polishing


The usual response from statisticians and data professionals to pie charts ranges from lofty disdain to outright snobbery. But sometimes I think they’re the right tool for communication with a particular audience. Like others I was struck by this image from New Zealand news site showing that nearly half the earthquake energy of th...

1826 sym R (1594 sym/1 pcs) 2 img

Earthquake energy over time


Disclaimer on all that follows – I am not an earthquake scientist and have cobbled together this post from sources like Wikipedia, official open data, and a range of information sites. There may be mistakes and misinterpretations that follow. Energy release from earthquakes is extremely variable My last blog post left me interested in finding ...

5958 sym R (6310 sym/2 pcs) 14 img