Publications by Peter's stats stuff - R
Does seasonally adjusting first help forecasting?
The experiment A colleague at work was working with a time series where one got quite different results depending on whether one seasonally adjusted it first, or treated the seasonality as part of a SARIMA (seasonal auto-regressive integrated moving average) model. I have some theories about why this might have happened which I won’t go into ...
9392 sym R (7811 sym/1 pcs) 6 img 5 tbl
US Presidential inauguration speeches
There’s rightly been a lot of attention paid to USA President Trump’s first speech as President at the time of his inauguration. The speech broke with tradition by extending the acrimonious atmosphere of the election campaign into the Presidency. There’s been an instant flurry of analysis, with the Washington Post leading the way with thi...
7461 sym R (6641 sym/1 pcs) 6 img
Moving largish data from R to H2O – spam detection with Enron emails
Moving around sparse matrices of text data – the limitations of as.h2o This post is the resolution of a challenge I first wrote about in late 2016, moving large sparse data from an R environment onto an H2O cluster for machine learning purposes. In that post, I experimented with functionality recently added by the H2O team to their supporting ...
8400 sym R (15175 sym/11 pcs) 2 img
Success rates of appeals to the Supreme Court by Circuit
In the chaos of the last month or so of United States of America governance, one item that grabbed my attention was the claim by President Trump that 80% of appeals decided by the Ninth Circuit Court of Appeal are overturned by the Supreme Court of the United States (SCOTUS): “In fact, we had to go quicker than we thought because of the bad de...
7079 sym R (5753 sym/4 pcs) 8 img
Visualising relationships between children’s books
At the OZCOTS (Australian Conference on Teaching Statistics) in late 2016 George Cobb gave a great talk entitled “Ask not what data science can do for the Humanities. Ask rather, what the Humanities can do for data science.” “George Cobb of Mt Holyoke College gave this keynote address to open the Australian Conference on Teaching Statistic...
8539 sym R (1555 sym/1 pcs) 8 img
New data and functions in nzelect 0.3.0 R package
Polling data and other goodies ready for download A new version, 0.3.0, of the nzelect R package is now available on CRAN. historical polling data from 2002 to February 2017, sourced from Wikipedia some small functions to help convert voting numbers into seats in a New Zealand or similar proportional representation system; and to weight polling ...
6997 sym R (6407 sym/4 pcs) 10 img
Simulations to explore excessive lagged X variables in time series modelling
I was once in a meeting discussing a time series modelling and forecasting challenge where it was suggested that “the beauty of regression is you just add in more variables and more lags of variables and try the combinations until you get something that fits really well”. Well, no, it doesn’t work like that; at least not in many cases with...
10869 sym R (6308 sym/4 pcs) 8 img
House effects in New Zealand voting intention polls
This post is one of a series leading up to a purely data-driven probabilistic prediction model for the New Zealand general election in 2017. No punditry will be indulged in (if only to avoid complications with my weekday role as an apolitical public servant)! This is straight statistics, if there is such a thing… There are important sources o...
8154 sym R (5704 sym/1 pcs) 8 img 1 tbl
New Zealand election forecasts
Over the weekend I released a new webpage, connected to this blog, with forecasts for the New Zealand 2017 General Election. The aim is to go beyond poll aggregation to something that takes the uncertainty of the future into account, as well as relatively minor issues such as the success (or not) of different polling firms predicting results in ...
7696 sym 10 img
Exploring propensity score matching and weighting
This post jots down some playing around with the pros, cons and limits of propensity score matching or weighting for causal social science research. Intro to propensity score matching One is often faced with an analytical question about causality and effect sizes when the only data around is from a quasi-experiment, not the random controlled tri...
15760 sym R (6337 sym/13 pcs) 6 img