Publications by Peter's stats stuff - R
Declining sea ice in the Arctic
A number of data visualisations are circulating showing the disturbing rise in temperature at the North Pole and drop in coverage of Arctic sea ice. The current level of interest is credited to a tweet from Zack Labe, whose Twitter page is a great source of interesting visualisations on sea ice. Secondary examples chosen more or less at random ...
4815 sym R (4967 sym/1 pcs) 6 img
Error, trend, seasonality – ets and its forecast model friends
A broad family of fast and effective forecast methods Exponential smoothing state space methods constitute a broad family of approaches to univariate time series forecasting that have been around for many decades and only in the twenty-first century placed into a systematic framework. The definitive book on the subject is Hyndman, Koehler, Ord a...
6495 sym R (4037 sym/1 pcs) 10 img
Why time series forecasts prediction intervals aren’t as good as we’d hope
Five different sources of error When it comes to time series forecasts from a statistical model we have five sources of error: Random individual errors Random estimates of parameters (eg the coefficients for each autoregressive term) Uncertain meta-parameters (eg number of autoregressive terms) Unsure if the model was right for the historical da...
4475 sym R (4158 sym/1 pcs) 4 img
Extrapolation is tough for trees!
Out-of-sample extrapolation This post is an offshoot of some simple experiments I made to help clarify my thinking about some machine learning methods. In this experiment I fit four kinds of model to a super-simple artificial dataset with two columns, x and y; and then try to predict new values of y based on values of x that are outside the orig...
5129 sym R (3004 sym/7 pcs) 4 img
Air quality in Indian cities
Seasonal air pollution in India The motivation for this blog post was a conference paper I recently heard that analysed five years of daily pollution data in an India city with a non-seasonal auto-regressive integrated moving average (ARIMA) model. In discussion after the presentation, there were differing views on whether such data should be mo...
8138 sym R (6161 sym/8 pcs) 18 img
forecastHybrid 0.3.0 on CRAN
Make it easy to make ensemble time series forecast forecastHybrid is an R package to make it easier to use the average predictions of ‘ensembles’ (or ‘combinations’) of time series models from Rob Hyndman’s forecast package. It looks after the averaging, and also calculates prediction intervals by a conservative method that aims to red...
5417 sym R (2167 sym/5 pcs) 10 img
Extracting data on shadow economy from PDF tables
Data on the shadow economy? I’m reading Kenneth Rogoff’s The Curse of Cash. It was one of Bloomberg’s Best Books of 2016 and the Financial Times’ Best Economics Books of 2016, and I recommend it. It’s an excellent and convincing book, making the case for getting rid of large denomination notes for three reasons: to put the squeeze on...
6017 sym R (3642 sym/2 pcs) 4 img
Sparse matrices, k-means clustering, topic modelling with posts on the 2004 US Presidential election
Daily Kos bags of words from the time of the 2004 Presidential election This is a bit of a rambly blog entry today. My original motivation was to just explore moving data around from R into the H2O machine learning software. While successful on this, and more on it below, I got a bit interested in my example data in its own right. The best ini...
16178 sym R (7335 sym/8 pcs) 12 img
Cross-validation of topic modelling
Determining the number of “topics” in a corpus of documents In my last post I finished by topic modelling a set of political blogs from 2004. I made a passing comment that it’s a challenge to know how many topics to set; the R topicmodels package doesn’t do this for you. There’s quite a discussion on this out there, but nearly all the...
9024 sym R (5663 sym/6 pcs) 8 img
Books I like
If you’re serious about learning, you probably need to read a book at some point. These days if you want to learn applied statistics and data science tools, you have amazing options in the form of blogs, Q&A sites, and massive open online courses and even videos on You Tube. Wikipedia is also an amazing reference resource on statistics. I use ...
18556 sym 2 img