Publications by Gavin L. Simpson
Introducing gratia
I use generalized additive models (GAMs) in my research work. I use them a lot! Simon Wood’s mgcv package is an excellent set of software for specifying, fitting, and visualizing GAMs for very large data sets. Despite recently dabbling with brms, mgcv is still my go-to GAM package. The only down-side to mgcv is that it is not very tidy-aware an...
9360 sym R (2369 sym/13 pcs) 10 img
Confidence intervals for GLMs
You’ve estimated a GLM or a related model (GLMM, GAM, etc.) for your latest paper and, like a good researcher, you want to visualise the model and show the uncertainty in it. In general this is done using confidence intervals with typically 95% converage. If you remember a little bit of theory from your stats classes, you may recall that such a...
10694 sym R (4703 sym/19 pcs) 6 img
Tibbles, checking examples, & character encodings
Recently I’ve been preparing my gratia package for submission to CRAN. During my pre-flight testing I noticed an issue under Windows checking the examples in the package against the reference output I generated on linux. In the latest release of the tibble package, the way tibbles are printed has changed subtly and in a way that leads to cross-...
4687 sym R (3328 sym/7 pcs)
radian: a modern console for R
Whenever I’m developing R code or writing data wrangling or analysis scripts for research projects that I work on I use Emacs and its add-on package Emacs Speaks Statistics (ESS). I’ve done so for nigh on a couple of decades now, ever since I switched full time to running Linux as my daily OS. For years this has served me well, though I would...
6398 sym R (614 sym/6 pcs) 8 img
Pivoting tidily
One of the fun bits of my job is that I have actual time dedicated to helping colleagues and grad students with statistical or computational problems. Recently I’ve been helping one of our Lab Instructors with some data that from their Plant Physiology Lab course. Whilst I was writing some R code to import the raw data for the lab from an Excel...
12035 sym R (5854 sym/14 pcs) 10 img
Rendering your README with GitHub Actions
There’s one thing that has bugged me for a while about developing R packages. We have all these nice, modern tools we have for tracking our code, producing web sites from the roxygen documentation, an so on. Yet for every code commit I make to the master branch of a package repo, there’s often two or more additional steps I need to take to ke...
12165 sym R (3143 sym/11 pcs)
gratia 0.4.1 released
After a slight snafu related to the 1.0.0 release of dplyr, a new version of gratia is out and available on CRAN. This release brings a number of new features, including differences of smooths, partial residuals on partial plots of univariate smooths, and a number of utility functions, while under the hood gratia works for a wider range of models...
3937 sym R (2525 sym/8 pcs) 6 img
Extrapolating with B splines and GAMs
An issue that often crops up when modelling with generlaized additive models (GAMs), especially with time series or spatial data, is how to extrapolate beyond the range of the data used to train the model? The issue arises because GAMs use splines to learn from the data using basis functions. The splines themselves are built from basis functions ...
16608 sym R (10906 sym/32 pcs) 14 img
Two new versions of gratia released
While the Covid-19 pandemic and teaching a new course in the fall put paid to most of my development time last year, some time off work this January allowed me time to work on gratia ???? again. I released 0.5.0 to CRAN in part to fix an issue with tests not running on the new M1 chips from Apple because I wasn’t using vdiffr ???? conditionally...
8333 sym R (4885 sym/15 pcs) 6 img
Getting data from the Canada Covid-19 Tracker using R
Last semester (Fall 2020) I taught a new course in healthcare data science for the Johnson Shoyama Graduate School in Public Policy. One of the final topics of the course was querying application programming interfaces (APIs) from within R. The example we used was querying data on the Covid 19 pandemic from the Covid-19 Tracker Canada, which has ...
5816 sym R (4408 sym/16 pcs) 4 img