Publications by nsaunders
How long since your team scored 100+ points? This blog’s first foray into the fitzRoy R package
When this blog moved from bioinformatics to data science I ran a Twitter poll to ask whether I should start afresh at a new site or continue here. “Continue here”, you said. So let’s test the tolerance of the long-time audience and celebrate the start of the 2019 season as we venture into the world of – Australian football (AFL) statistic...
5716 sym R (6309 sym/12 pcs) 6 img
Mapping the Vikings using R
The commute to my workplace is 90 minutes each way. Podcasts are my friend. I’m a long-time listener of In Our Time and enjoyed the recent episode about The Danelaw. Melvyn and I hail from the same part of the world, and I learned as a child that many of the local place names there were derived from Old Norse or Danish. Notably: places ending i...
2735 sym R (1684 sym/3 pcs) 4 img
Geelong and the curse of the bye
This week we return to Australian Rules Football, the R package fitzRoy and some statistics to ask – why can’t Geelong win after a bye? (with apologies to long-time readers who used to come for the science) Code and a report for this blog post are available at Github. First, some background. In 2011 the AFL expanded from 16 to 17 teams with ...
4325 sym R (1717 sym/3 pcs) 4 img 6 tbl
Can random forest provide insights into how yeast grows?
I’m not saying this is a good idea, but bear with me. A recent question on Stack Overflow [r] asked why a random forest model was not working as expected. The questioner was working with data from an experiment in which yeast was grown under conditions where (a) the growth rate could be controlled and (b) one of 6 nutrients was limited. Their ...
4510 sym R (1586 sym/4 pcs) 2 img 1 tbl
Twitter coverage of the useR! 2019 conference
Very briefly: Last week was useR! conference time again, coming to you this time from Toulouse, France I’ve retrieved 8 318 tweets that mention #user2019 and run them through my report generator And here are the results Take-home message this year: the R Ladies rock! Related To leave a comment for the author, please follow the link and comm...
693 sym 2 img
Extracting Sydney transport data from Twitter
The @sydstats Twitter account uses this code base, and data from the Transport for NSW Open Data API to publish insights into delays on the Sydney Trains network. Each tweet takes one of two forms and is consistently formatted, making it easy to parse and extract information. Here are a couple of examples with the interesting parts highlighted in...
1288 sym 2 img
Debuting in a VFL/AFL Grand Final is rare
When Marlion Pickett runs onto the M.C.G for Richmond in the AFL Grand Final this Saturday, he’ll be only the sixth player in 124 finals to debut on the big day. The sole purpose of this blog post is to illustrate how incredibly easy it is to figure this out, thanks to the dplyr and fitzRoy packages. library(dplyr) library(fitzRoy) afldata <...
727 sym R (616 sym/1 pcs) 1 tbl
Florence Nightingale’s “rose charts” (and others) in ggplot2
It’s been a while. I hope you are all well. Shall we make some charts? About this time last year, one of my life-long dreams came true when I was told that I could work from home indefinitely. One effect of this – I won’t say downside – is that I don’t get through as many podcast episodes as I used to. Only a select few podcasts make t...
1957 sym R (1077 sym/1 pcs) 2 img
How I resurrected my ancient PhD thesis using R/bookdown (and some other tools)
An ancient thesisI’ve long admired the look of publications generated using the R bookdown package, and thought it would be fun and educational to publish one myself. The problem is that I am not writing a book and have no plans to do so any time soon. Then I remembered that I’ve already written a book. There it is on the right. It’s called...
11119 sym R (5894 sym/7 pcs) 2 img
Gene names, data corruption and Excel: a 2021 update
It’s an old favourite of this blog, isn’t it. We had Gene name errors and Excel: lessons not learned (2012). Followed by Data corruption using Excel: 12+ years and counting (2016). Perhaps most depressingly of all, the conclusion of the trilogy, When your tools are broken, just change the data (2019-20). Well, I’m happy (?) to see the publi...
1387 sym