Publications by Tony Hirst

Sketching Scatterplots to Demonstrate Different Correlations

17.12.2014

Looking just now for an openly licensed graphic showing a set of scatterplots that demonstrate different correlations between X and Y values, I couldn’t find one. So here’s a quick R script for constructing one, based on a Cross Validated question/answer (Generate two variables with precise pre-specified correlation): library(MASS) corrdata=...

1147 sym R (519 sym/1 pcs) 8 img

Custom Gridlines and Line Guides in R/ggplot Charts

02.01.2015

In the last quarter of last year, I started paying more attention to the use of custom grid lines and line guides in charts I’ve been developing for the Wrangling F1 Data With R book. The use of line guides was in part inspired by canopy views from within the cockpit of one of the planes that makes up the Red Arrows aerobatic display team. A l...

2886 sym R (731 sym/2 pcs) 10 img

Book Extras – Data Files, Code Files and a Dockerised Application

05.01.2015

Idling through the LeanPub documentation last night, I noticed that they support the ability to sell digital extras, such as bundled code files or datafiles. Along with the base book sold at one price, additional extras can be bundled into packages alongside the original book and sold at another (higher) price. As with the book sales, two price p...

2655 sym R (659 sym/1 pcs) 6 img

Calculating Churn in Seasonal Leagues

09.01.2015

One of the things I wanted to explore in the production of the Wrangling F1 Data With R book was the extent to which I could draw on published academic papers for inspiration in exploring the the various results and timing datasets. In a chapter published earlier this week, I explored the notion of churn, as described in Mizak, D, Neral, J & Stai...

2675 sym R (814 sym/2 pcs) 42 img

Connecting RStudio and MySQL Docker Containers – an example using the ergast db

17.01.2015

building on Dockerising Open Data Databases – First Fumblings and my Book Extras – Data Files, Code Files and a Dockerised Application, I just figured out how to get the ergast db into a MySQL docker container and then query it from RStudio: Download and unzip the f1db.sql.gz file to f1db.sql install these docker-mysql-scripts run boot2docke...

2557 sym 10 img

Rediscovering Formula One Race Battlemaps

31.01.2015

A couple of days ago, I posted a recipe on the F1DataJunkie blog that described how to calculate track position from laptime data. Using that information, as well as additional derived columns such as the identity of, and time to, the cars immediately ahead of and behind a particular selected driver, both in terms of track position and race posit...

3996 sym 8 img

Code as Magic, and the Vernacular of Data Wrangling Verbs

11.02.2015

It’s been some time now since I drafted most of my early unit contributions to the TM351 Data management and analysis course. Part of the point (for me) in drafting that material was to find out what sorts of thing we actually wanted to say and help identify the sorts of abstractions we wanted to then build a narrative around. Another part of t...

6455 sym 14 img

Tools in Tandem – SQL and ggplot. But is it Really R?

28.02.2015

Increasingly I find that I have fallen into using not-really-R whilst playing around with Formula One stats data. Instead, I seem to be using a hybrid of SQL to get data out of a small SQLite3 datbase and into an R dataframe, and then ggplot2 to render visualise it. So for example, I’ve recently been dabbling with laptime data from the ergast d...

3039 sym R (854 sym/2 pcs) 8 img

So What Can Text Analysis Do for You?

02.03.2015

Despite believing we can treat anything we can represent in digital form as “data”, I’m still pretty flakey on understanding what sorts of analysis we can easily do with different sorts of data. Time series analysis is one area – the pandas Python library has all manner of handy tools for working with that sort of data that I have no idea...

6797 sym 8 img

What’s the Point of an API?

09.03.2015

Trying to clear my head of code on a dog walk after a couple of days tinkering with the nomis API and I started to ponder what an API is good for. Chris Gutteridge and Alex Duttion’s open data excuses bingo card and Owen Boswarva’s Open Data Publishing Decision Tree both suggest that not having an API can be used as an excuse for not publishi...

3704 sym 16 img