Publications by Andrew Collier
Casting a Wide (and Sparse) Matrix in R
I routinely use melt() and cast() from the reshape2 package as part of my data munging workflow. Recently I’ve noticed that the data frames I’ve been casting are often extremely sparse. Stashing these in a dense data structure just feels wasteful. And the dismal drone of page thrashing is unpleasant. So I had a look around for an alternative....
1485 sym R (806 sym/4 pcs)
Kaggle: Santa’s Stolen Sleigh
This morning I read Wendy Kan’s interesting post on Creating Santa’s Stolen Sleigh. I hadn’t really thought too much about the process of constructing an optimisation competition, but Wendy gave some interesting insights on the considerations involved in designing a competition which was both fun and challenging but still computationally fe...
10207 sym R (950 sym/2 pcs) 28 img 2 tbl
flipsideR: Support for ASX Option Chain Data
I previously wrote about some ad hoc R code for downloading Option Chain data from Google Finance. I finally wrapped it up into a package called flipsideR, which is now available via GitHub. Since I last wrote on this topic I’ve also added support for downloading option data from the Australian Securities Exchange (ASX). Installation Installati...
1614 sym R (2479 sym/5 pcs)
Durban R Users Group Meetup: 24 February 2016 @ The Green Door
We’re kicking off the inaugural meeting of the Durban R Users Group with a live video presentation by Andrie de Vries (Senior Programme Manager, R Community Projects at Microsoft / Revolution Analytics). Andrie will be talking about “Demonstration of using R in the cloud together with Azure Machine Learning”. If you’ve kept up with Micros...
1465 sym
R, HDF5 Data and Lightning
I used to spend an inordinate amount of time digging through lightning data. These data came from a number of sources, the World Wide Lightning Location Network (WWLLN) and LIS/OTD being the most common. I recently needed to work with some Hierarchical Data Format (HDF) data. HDF is something of a niche format and, since that was the format used ...
5701 sym R (613 sym/7 pcs) 4 img
International Open Data Day
As part of International Open Data Day we spent the morning with a bunch of like minded people poring over some open Census South Africa data. Excellent initiative, @opendatadurban, I’m very excited to see where this is all going and look forward to contributing to the journey! The data above show the distribution of ages in a segment of the S...
1082 sym
R Saturday [satRday] in Cape Town
I put in a proposal to host a R Saturday [satRday] in Cape Town next year. The R Consortium has committed to funding three of these events: one will be in Hungary, another will be somewhere in the USA and the third will be elsewhere in the world. The voting has opened for the location of these events. Cast your vote here. Please consider voting f...
960 sym 2 img
Major League Baseball Birth Months
The cutoff date for almost all nonschool baseball leagues in the United States is July 31, with the result that more major league players are born in August than in any other month.Malcolm Gladwell, Outliers A quick analysis to confirm Gladwell’s assertion above. Used data scraped from www.baseball-reference.com. Here’s the evidence: Distrib...
1363 sym R (338 sym/2 pcs) 2 img
Most Probable Birth Month
In a previous post I showed that the data from www.baseball-reference.com support Malcolm Gladwell’s contention that more professional baseball players are born in August than any other month. Although this might be explained by the 31 July cutoff for admission to baseball leagues, it was suggested that it could also be linked to a larger propo...
1780 sym R (1023 sym/4 pcs) 4 img
Birth Month by Gender
Based on some feedback to a previous post I normalised the birth counts by the (average) number of days in each month. As pointed out by a reader, the results indicate a gradual increase in the number of conceptions during (northern hemisphere) Autumn and Winter, roughly up to the end of December. Normalising the data to give births per day also ...
1016 sym R (756 sym/2 pcs) 4 img