Publications by andrew
Finding Correlations in Data with Uncertainty
A week or so ago a colleague of mine asked if I knew how to calculate correlations for data with uncertainties. Now, if we are going to be honest, then all data should have some level of experimental or measurement error. However, I suspect that in the majority of cases these uncertainties are ignored when considering correlations. To what degree...
4312 sym R (4754 sym/11 pcs) 6 img
Finding Correlations in Data with Uncertainty: Classical Solution
Following up on my previous post as a result of an excellent suggestion from Andrej Spiess. The data are indeed very heteroscedastic! Andrej suggested that an alternative way to attack this problem would be to use weighted correlation with weights being the inverse of the measurement variance. Let’s look at the synthetic data first. > library(...
1095 sym R (373 sym/2 pcs)
Fitting a Model by Maximum Likelihood
Maximum-Likelihood Estimation (MLE) is a statistical technique for estimating model parameters. It basically sets out to answer the question: what model parameters are most likely to characterise a given set of data? First you need to select a model for the data. And the model must have one or more (unknown) parameters. As the name implies, MLE p...
5928 sym R (4871 sym/18 pcs) 2 img
The Wonders of foreach
Writing code from scratch to do parallel computations can be rather tricky. However, the packages providing parallel facilities in R make it remarkably easy. One such package is foreach. I am going to document my trail of discovery with foreach, which began some time ago, but has really come into fruition over the last few weeks. First we need a ...
6648 sym R (5789 sym/22 pcs) 2 img
Presenting Conformance Statistics
A client came to me with some conformance data. She was having a hard time making sense of it in a spreadsheet. I had a look at a couple of ways of presenting it that would bring out the important points. The Data The data came as a spreadsheet with multiple sheets. Each of the sheets had a slightly different format, so the easiest thing to do wa...
3461 sym R (2972 sym/6 pcs) 6 img
Text Mining the Complete Works of William Shakespeare
I am starting a new project that will require some serious text mining. So, in the interests of bringing myself up to speed on the tm package, I thought I would apply it to the Complete Works of William Shakespeare and just see what falls out. The first order of business was getting my hands on all that text. Fortunately it is available from a nu...
5316 sym R (6826 sym/20 pcs) 2 img
Clustering the Words of William Shakespeare
In my previous post I used the tm package to do some simple text mining on the Complete Works of William Shakespeare. Today I am taking some of those results and using them to generate word clusters. Preparing the Data I will start with the Term Document Matrix (TDM) consisting of 71 words commonly used by Shakespeare. > inspect(TDM.common[1:10,...
2051 sym R (1667 sym/6 pcs) 2 img
Clustering Lightning Discharges to Identify Storms
A short talk that I gave at the LIGHTS 2013 Conference (Johannesburg, 12 September 2013). The slides are relatively devoid of text because I like the audience to hear the content rather than read it. The central message of the presentation is that clustering lightning discharges into storms is not a trivial task, but still a worthwhile challenge ...
1387 sym
Citations for using Stan?
Bob writes: If you have papers that have used Stan, we’d love to hear about it. We finally got some submissions, so we’re going to start a list on the web site for 2.0 in earnest. You can either mail them to the list, to me directly, or just update the issue (at least until it’s closed or moved): https://github.com/stan-dev/stan/issues/18...
1228 sym 2 img
Top 250 Movies at IMDb
Some years ago I allowed myself to accept a challenge to read the Top 100 Novels of All Time (complete list here). This list was put together by Richard Lacayo and Lev Grossman at Time Magazine. To start with I could tick off a number of books that I had already read. That left me with around 75 books outstanding. So I knuckled down. The Lord ...
3039 sym R (2039 sym/4 pcs) 2 img