Publications by Andrew Collier
#MonthOfJulia Day 25: Interfacing with Other Languages
Julia has native support for calling C and FORTRAN functions. There are also add on packages which provide interfaces to C++, R and Python. We’ll have a brief look at the support for C and R here. Further details on these and the other supported languages can be found on github. Why would you want to call other languages from within Julia? Here...
3010 sym R (2601 sym/11 pcs) 2 img
Review: Data Mining with Rattle and R
I read Data Mining with Rattle and R by Graham Williams over a year ago. It’s not a new book and I’ve just been tardy in writing up a review. That’s not to say that I have not used the book in the interim: it’s been on my desk at work ever since and I’ve dipped into it from time to time. As a reference for ongoing analyses it’s an ext...
5717 sym R (268 sym/1 pcs) 4 img
Review: Beautiful Data
I’ve just finished reading Beautiful Data (published by O’Reilly in 2009), a collection of essays edited by Toby Segaran and Jeff Hammerbacher. The 20 essays from 39 contributors address a diverse array of topics relating to data and how it’s collected, analysed and interpreted. Since this is a collection of essays, the writing style and l...
5332 sym R (27 sym/1 pcs) 2 img
LIBOR and Bond Yields
I’ve just been looking at the historical relationship between the London Interbank Offered Rate (LIBOR) and government bond yields. LIBOR data can be found at Quandl and comes in CSV format, so it’s pretty simple to digest. The bond data can be sourced from the US Department of the Treasury. It comes as XML and requires a little more work. > ...
1520 sym R (738 sym/1 pcs)
Graph from Sparse Adjacency Matrix
I spent a decent chunk of my morning trying to figure out how to construct a sparse adjacency matrix for use with graph.adjacency(). I’d have thought that this would be rather straight forward, but I tripped over a few subtle issues with the Matrix package. My biggest problem (which in retrospect seems rather trivial) was that elements in my ad...
1262 sym R (1124 sym/4 pcs) 2 img
Installing XGBoost on Ubuntu
XGBoost is the flavour of the moment for serious competitors on kaggle. It was developed by Tianqi Chen and provides a particularly efficient implementation of the Gradient Boosting algorithm. Although there is a CLI implementation of XGBoost you’ll probably be more interested in using it from either R or Python. Below are instructions for gett...
1794 sym R (145 sym/3 pcs) 2 img
Making Sense of Logarithmic Loss
Logarithmic Loss, or simply Log Loss, is a classification loss function often used as an evaluation metric in kaggle competitions. Since success in these competitions hinges on effectively minimising the Log Loss, it makes sense to have some understanding of how this metric is calculated and how it should be interpreted. Log Loss quantifies the a...
4630 sym R (321 sym/2 pcs) 10 img
Using Checksum to Guess Message Length: Not a Good Idea!
A question posed by one of my colleagues: can a checksum be used to guess message length? My immediate response was negative and, as it turns out, a simple simulation supported this knee-jerk reaction. Here’s the situation: a piece of software has been written to process a stream of messages. Each message is a sequence of bytes, where the lengt...
3560 sym R (88 sym/2 pcs) 2 img
Review: Learning Shiny
I was asked to review Learning Shiny (Hernán G. Resnizky, Packt Publishing, 2015). I found the book to be useful, motivating and generally easy to read. I’d already spent some time dabbling with Shiny, but the book helped me graduate from paddling in the shallows to wading out into the Shiny sea. The book states its objective as: … this book...
6012 sym 2 img
Kaggle: Walmart Trip Type Classification
Walmart Trip Type Classification was my first real foray into the world of Kaggle and I’m hooked. I previously dabbled in What’s Cooking but that was as part of a team and the team didn’t work out particularly well. As a learning experience the competition was second to none. My final entry put me at position 155 out of 1061 entries which, ...
6749 sym R (793 sym/2 pcs) 12 img