Publications by Peter's stats stuff - R
Skill v luck in determining backgammon winners
Getting backgammon data out of XG-Gammon Backgammon is a game that combines chance and skill, and everyone who comes across it asks “how much is luck, and how much skill?”. The answer of course is “it depends”. Consider that for two exactly equally skilled players, the result appears to be 100% luck and the best forecast for a result is...
11732 sym R (5911 sym/1 pcs) 12 img
Seasonal decomposition in the ggplot2 universe with ggseas
The ggseas package for R, which provides convenient treatment of seasonal time series in the ggplot2 universe, was first released by me in February 2016 and since then has been enhanced several ways. The latest version, 0.4.0, is now on CRAN. The improvements since I last blogged about ggseas include: added the convenience function tsdf() to c...
5255 sym R (1479 sym/6 pcs) 14 img
Election analysis contest entry part 1 – introducing the nzelect R package
The contest Inspired by Ari Lamstein’s R Election Analysis Contest, I’ve fast-tracked a project that’s been at the back of my mind for a while, to make available in a friendly, tidy R package a range of data about New Zealand elections. My entry for the contest will involve 3 or 4 posts over the next week or so: Today’s post, introducin...
12560 sym R (9985 sym/7 pcs) 10 img
Election analysis contest entry part 2 – building the nzelect R package
Motivation This post is the second in a series that make up my entry in Ari Lamstein’s R Election Analysis Contest, Yesterday I introduced the nzelect R package from a user perspective. Today I’m writing about how the build of that package works. This might be of interest to someone planning on doing something similar, or to anyone who wan...
7217 sym R (7618 sym/5 pcs) 4 img
Election analysis contest entry part 3 – interactive exploration of voting locations with leaflet and Shiny
Motivation This post is the third in a series that make up my entry in Ari Lamstein’s R Election Analysis Contest. First I introduced the nzelect R package from a user perspective. Second was a piece on how the build of that package works. Today, the third in the series introduces an interactive map of results by voting location drawing on th...
6700 sym R (472 sym/2 pcs) 6 img
Election analysis contest entry part 4 – drivers of preference for Green over Labour party
Motivation This post is the fourth in a series that make up my entry in Ari Lamstein’s R Election Analysis Contest. Earlier posts introduced the nzelect R package, basic usage, how it was built, and an exploratory Shiny web application. Today I follow up on discussion in the StatsChat blog. A post there showed a screen shot from my Shiny app,...
10054 sym R (7706 sym/1 pcs) 4 img
Announcing new forecastHybrid package
Background and motivation In an earlier post I explored ways that might improve on standard methods for prediction intervals from univariate time series forecasting. One of the tools I used was a convenience function to combine forecasts from Rob Hyndman’s ets and auto.arima functions. David Shaub (with a small contribution from myself) has n...
5920 sym R (3700 sym/6 pcs) 6 img
Minimalist Tufte-inspired axis text with Scottish-New Zealand historical material
Minimalist axes text and tick marks One of the ideas in Edward Tufte’s The Visual Display of Quantitative Information was to minimise non-data-ink by dropping the regular text labelling values on axis guides, and instead using the axis guides to mark the values of the actual data. For those of you with the book in front of you, I’m thinking ...
3787 sym R (2081 sym/1 pcs) 2 img
Visual contrast of two robust regression methods
Robust regression For training purposes, I was looking for a way to illustrate some of the different properties of two different robust estimation methods for linear regression models. The two methods I’m looking at are: least trimmed squares, implemented as the default option in lqs() a Huber M-estimator, implemented as the default option in...
9507 sym R (5361 sym/2 pcs) 6 img
Actual coverage of confidence intervals for standard deviation
Overview How big a sample size would you think you need to get a reliable 95% confidence interval (ie one that really does contain the true value 95% of the time) for a single univariate statistic like standard deviation? 30? 50? Turns out the answer is more like 20,000 for some not-particularly-extreme real-world data. In this post I explore th...
10078 sym R (9052 sym/5 pcs) 20 img