Publications by andrew

The Wonders of foreach

25.08.2013

Writing code from scratch to do parallel computations can be rather tricky. However, the packages providing parallel facilities in R make it remarkably easy. One such package is foreach. I am going to document my trail of discovery with foreach, which began some time ago, but has really come into fruition over the last few weeks. First we need a ...

6648 sym R (5789 sym/22 pcs) 2 img

Presenting Conformance Statistics

27.08.2013

A client came to me with some conformance data. She was having a hard time making sense of it in a spreadsheet. I had a look at a couple of ways of presenting it that would bring out the important points. The Data The data came as a spreadsheet with multiple sheets. Each of the sheets had a slightly different format, so the easiest thing to do wa...

3461 sym R (2972 sym/6 pcs) 6 img

Text Mining the Complete Works of William Shakespeare

05.09.2013

I am starting a new project that will require some serious text mining. So, in the interests of bringing myself up to speed on the tm package, I thought I would apply it to the Complete Works of William Shakespeare and just see what falls out. The first order of business was getting my hands on all that text. Fortunately it is available from a nu...

5316 sym R (6826 sym/20 pcs) 2 img

Clustering the Words of William Shakespeare

10.09.2013

In my previous post I used the tm package to do some simple text mining on the Complete Works of William Shakespeare. Today I am taking some of those results and using them to generate word clusters. Preparing the Data I will start with the Term Document Matrix (TDM) consisting of 71 words commonly used by Shakespeare. > inspect(TDM.common[1:10,...

2051 sym R (1667 sym/6 pcs) 2 img

Clustering Lightning Discharges to Identify Storms

13.09.2013

A short talk that I gave at the LIGHTS 2013 Conference (Johannesburg, 12 September 2013). The slides are relatively devoid of text because I like the audience to hear the content rather than read it. The central message of the presentation is that clustering lightning discharges into storms is not a trivial task, but still a worthwhile challenge ...

1387 sym

Citations for using Stan?

23.09.2013

Bob writes: If you have papers that have used Stan, we’d love to hear about it. We finally got some submissions, so we’re going to start a list on the web site for 2.0 in earnest. You can either mail them to the list, to me directly, or just update the issue (at least until it’s closed or moved): https://github.com/stan-dev/stan/issues/18...

1228 sym 2 img

Top 250 Movies at IMDb

02.10.2013

Some years ago I allowed myself to accept a challenge to read the Top 100 Novels of All Time (complete list here). This list was put together by Richard Lacayo and Lev Grossman at Time Magazine. To start with I could tick off a number of books that I had already read. That left me with around 75 books outstanding. So I knuckled down. The Lord ...

3039 sym R (2039 sym/4 pcs) 2 img

Applying an Operation to a List of Variables

14.10.2013

Just a quick note on a short hack that I cobbled together this morning. I have an analysis where I need to perform the same set of operations to a list of variables. In order to do this in a compact and robust way, I wanted to write a loop that would run through the variables and apply the operations to each of them in turn. This can be done usin...

1505 sym R (147 sym/3 pcs)

Plotting Times of Discrete Events

19.10.2013

I recently enjoyed reading O’Hara, R. B., & Kotze, D. J. (2010). Do not log-transform count data. Methods in Ecology and Evolution, 1(2), 118–122. doi:10.1111/j.2041-210X.2010.00021.x. The article prompted me to think about processes involving discrete events and how these might be presented graphically. I am not talking about counts (which ...

3092 sym R (1317 sym/3 pcs) 6 img

R package for effect size calculations for psychology researchers

19.10.2013

Dan Gerlanc writes: I read your post the other day [now the other month, as our blog is on a bit of a delay] on helping psychologists do research and thought you might be interested in our R package, “bootES”, for robust effect size calculation and confidence interval estimation using resampling techniques. The package provides one function, ...

1496 sym