Publications by Tony Hirst
Reshaping Horse Import/Export Data to Fit a Sankey Diagram
As the food labeling and substituted horsemeat saga rolls on, I’ve been surprised at how little use has been made of “data” to put the structure of the food chain into some sort of context* (or maybe I’ve just missed those stories?). One place that can almost always be guaranteed to post a few related datasets is the Guardian Datastore, w...
5803 sym R (2071 sym/2 pcs) 16 img
Sketches Around Twitter Followers
I’ve been doodling… Following a query about the possible purchase of Twitter followers for various public figure accounts (I need to get my head round what the problem is with that exactly?!), I thought I’d have a quick look at some stats around follower groupings… I started off with a data grab, pulling down the IDs of accounts on a part...
4753 sym 26 img
What Happened Then? Using Approximated Twitter Follower Accession to Identify Political Events
Following a chat with @andypryke, I thought I’d try out a simple bit of feature detection around approximated follower acquisition charts (e.g. Estimated Follower Accession Charts for Twitter) to see if I could detect dates around which there were spikes in follower acquisition. So for example, here’s the follower acquistion chart for Seem Ma...
6625 sym R (2081 sym/3 pcs) 18 img
Publishing Stats for Analytic Reuse – FAOStat Website and R Package
How can stats and data publishers, from NGOs and (inter)national statistics agencies to scientific researchers, publish their data in a way that supports its analysis directly, as well as in combination with other datasets? Here’s one approach I learned about from Michael Kao of the UN Food and Agriculture Organisation statistics division, FAOS...
4685 sym 12 img
Revisiting MPs’ Expenses
I couldn’t but notice the chatter about Iain Duncan Smith claiming he’d have no problem “living on 53 pounds a dayweek“, which made me wonder not only how many meal catered events he attends each week (and how many of his scheduled meeting also have complementary tea and biscuits (a bellweather of the extent of cuts in many institutions�...
5706 sym R (2596 sym/1 pcs) 12 img
Splitting a Large CSV File into Separate Smaller Files Based on Values Within a Specific Column
One of the problems with working with data files containing tens of thousands (or more) rows is that they can become unwieldy, if not impossible, to use with “everyday” desktop tools. When I was Revisiting MPs’ Expenses, the expenses data I downloaded from IPSA (the Independent Parliamentary Standards Authority) came in one large CSV file p...
1752 sym R (528 sym/1 pcs) 6 img
Estimated Follower Accession Charts for Twitter
Just over a year or so ago, Mat Morrison/@mediaczar introduced me to a visualisation he’d been working on (How should Page Admins deal with Flame Wars?) that I started to refer to as an accession chart (Visualising Activity Around a Twitter Hashtag or Search Term Using R). The idea is that we provide each entrant into a conversation or group wi...
5670 sym R (547 sym/1 pcs) 8 img
Evaluating Event Impact Through Social Media Follower Histories, With Possible Relevance to cMOOC Learning Analytics
Last year I sat on a couple of panels organised by I’m a Scientist’s Shane McCracken at various science communication conferences. A couple of days ago, I noticed Shane had popped up a post asking Who are you Twitter?, a quick review of a social media mapping exercise carried out on the followers of the @imascientist Twitter account. Using t...
3087 sym 6 img
Datagrabbing Commonly Formatted Sheets from a Google Spreadsheet – Guardian 2014 University Guide Data
So it seems like it’s that time of year when the Guardian publish their university rankings data (Datablog: University guide 2014), which means another opportunity to have a tinker and see what I’ve learned since last year… (Last year’s hack was a Filtering Guardian University Data Every Which Way You Can…, where I had a quick go at cre...
3438 sym R (1273 sym/7 pcs) 14 img
Disposable Visual Data Explorers with Shiny – Guardian University Tables 2014
Have data – now what? Building your own interactive data explorer need not be a chore with the R shiny library… Here’s a quick walkthrough… In Datagrabbing Commonly Formatted Sheets from a Google Spreadsheet – Guardian 2014 University Guide Data, I showed how to grab some data from several dozen commonly formatted sheets in a Google sp...
5415 sym R (4957 sym/6 pcs) 8 img