Publications by Peter's stats stuff - R

Does seasonally adjusting first help forecasting?

21.01.2017

The experiment A colleague at work was working with a time series where one got quite different results depending on whether one seasonally adjusted it first, or treated the seasonality as part of a SARIMA (seasonal auto-regressive integrated moving average) model. I have some theories about why this might have happened which I won’t go into ...

9392 sym R (7811 sym/1 pcs) 6 img 5 tbl

US Presidential inauguration speeches

22.01.2017

There’s rightly been a lot of attention paid to USA President Trump’s first speech as President at the time of his inauguration. The speech broke with tradition by extending the acrimonious atmosphere of the election campaign into the Presidency. There’s been an instant flurry of analysis, with the Washington Post leading the way with thi...

7461 sym R (6641 sym/1 pcs) 6 img

Moving largish data from R to H2O – spam detection with Enron emails

17.02.2017

Moving around sparse matrices of text data – the limitations of as.h2o This post is the resolution of a challenge I first wrote about in late 2016, moving large sparse data from an R environment onto an H2O cluster for machine learning purposes. In that post, I experimented with functionality recently added by the H2O team to their supporting ...

8400 sym R (15175 sym/11 pcs) 2 img

Success rates of appeals to the Supreme Court by Circuit

25.02.2017

In the chaos of the last month or so of United States of America governance, one item that grabbed my attention was the claim by President Trump that 80% of appeals decided by the Ninth Circuit Court of Appeal are overturned by the Supreme Court of the United States (SCOTUS): “In fact, we had to go quicker than we thought because of the bad de...

7079 sym R (5753 sym/4 pcs) 8 img

Visualising relationships between children’s books

03.03.2017

At the OZCOTS (Australian Conference on Teaching Statistics) in late 2016 George Cobb gave a great talk entitled “Ask not what data science can do for the Humanities. Ask rather, what the Humanities can do for data science.” “George Cobb of Mt Holyoke College gave this keynote address to open the Australian Conference on Teaching Statistic...

8539 sym R (1555 sym/1 pcs) 8 img

New data and functions in nzelect 0.3.0 R package

10.03.2017

Polling data and other goodies ready for download A new version, 0.3.0, of the nzelect R package is now available on CRAN. historical polling data from 2002 to February 2017, sourced from Wikipedia some small functions to help convert voting numbers into seats in a New Zealand or similar proportional representation system; and to weight polling ...

6997 sym R (6407 sym/4 pcs) 10 img

Simulations to explore excessive lagged X variables in time series modelling

11.03.2017

I was once in a meeting discussing a time series modelling and forecasting challenge where it was suggested that “the beauty of regression is you just add in more variables and more lags of variables and try the combinations until you get something that fits really well”. Well, no, it doesn’t work like that; at least not in many cases with...

10869 sym R (6308 sym/4 pcs) 8 img

House effects in New Zealand voting intention polls

20.03.2017

This post is one of a series leading up to a purely data-driven probabilistic prediction model for the New Zealand general election in 2017. No punditry will be indulged in (if only to avoid complications with my weekday role as an apolitical public servant)! This is straight statistics, if there is such a thing… There are important sources o...

8154 sym R (5704 sym/1 pcs) 8 img 1 tbl

New Zealand election forecasts

25.03.2017

Over the weekend I released a new webpage, connected to this blog, with forecasts for the New Zealand 2017 General Election. The aim is to go beyond poll aggregation to something that takes the uncertainty of the future into account, as well as relatively minor issues such as the success (or not) of different polling firms predicting results in ...

7696 sym 10 img

Exploring propensity score matching and weighting

08.04.2017

This post jots down some playing around with the pros, cons and limits of propensity score matching or weighting for causal social science research. Intro to propensity score matching One is often faced with an analytical question about causality and effect sizes when the only data around is from a quasi-experiment, not the random controlled tri...

15760 sym R (6337 sym/13 pcs) 6 img