checks and {tiny}testing - a quick primer

A quick intro to checking and testing your functions, with help from {checkmate} and {tinytest}
This material was presented to a meeting of KIND (Knowledge and Information Network) in April this year.
checks
- What assumptions are you making about your data? (structure, names, types etc.)
- function arguments
- what users will and won’t do
tests
Describe what you expect your functions to do, and how they should behave with regards to user inputs
Checks : assertions
Tests : expectations
Let’s write a simple function that prints the name of a council area:
choose_council <- function(x){
out <- paste("chosen council is", x)
out
}
Now let’s try it out
choose_council("Highland")
[1] "chosen council is Highland"
choose_council("Argyll and Bute")
[1] "chosen council is Argyll and Bute"
choose_council("Bob")
[1] "chosen council is Bob"
choose_council(1)
[1] "chosen council is 1"
choose_council("Argyll & Bute")
[1] "chosen council is Argyll & Bute"
We can see the function works, but it’s simply printing whatever we pass it, even if the inputs make no real sense.
Base R functions
From the help:
match.arg
matches a character arg against a table of candidate values as specified by choices.
To put that more simply, to use the function, we need to pass an argument, and a vector of possible choices. The function will then check that argument against the choices to see if there is a match.
Let’s assume we only want to print Highland
and Argyll and Bute
How can we use match.arg
?
choose_council <- function(council = c("Highland",
"Argyll and Bute")){
res <- match.arg(council)
out <- paste("chosen council is", res)
return(out)
}
choose_council("Highland")
[1] "chosen council is Highland"
choose_council("Argyll and Bute")
[1] "chosen council is Argyll and Bute"
choose_council("Bob")
Error in match.arg(council): 'arg' should be one of "Highland", "Argyll and Bute"
choose_council(1)
Error in match.arg(council): 'arg' must be NULL or a character vector
choose_council("Argyll & Bute")
Error in match.arg(council): 'arg' should be one of "Highland", "Argyll and Bute"
if no value supplied, match.arg
uses the first element
choose_council() # match.arg uses default arguments
[1] "chosen council is Highland"
Partial matching is also possible - you can be lazy and only type the first few letters of your argument. This is OK for this very simple example, but not for real-life code - certainly not any code where you care about the results. (As an aside, if you regularly use T
or F
instead of TRUE
and FALSE
- you need to sort your life out)
This works, but .. careful now!
choose_council("A") # partial matching - can be risky
[1] "chosen council is Argyll and Bute"
stopifnot
We saw that our function didn’t work when we supplied a number.
choose_council(1)
In this case, match.arg
has it’s own checks in the background. But we can provide our own. We want to stop the function if a non character argument is provided.
We use stopifnot
to trigger immediately if a non character argument is passed.
If a character argument is passed, we use the choices
argument of match.arg
to validate that this is an acceptable value
choose_council <- function(council){
stopifnot(is.character(council))
res <- match.arg(council,
choices = c("Highland",
"Argyll and Bute"))
out <- paste("chosen council is", res)
return(out)
}
choose_council(1)
Error in choose_council(1): is.character(council) is not TRUE
choose_council("Argyll & Bute")
Error in match.arg(council, choices = c("Highland", "Argyll and Bute")): 'arg' should be one of "Highland", "Argyll and Bute"
Yikes.
We can add friendlier messages
choose_council <- function(council){
stopifnot("council must be character" = is.character(council))
res <- match.arg(council,
choices = c("Highland",
"Argyll and Bute"))
out <- paste("chosen council is", res)
return(out)
}
Partial matching works as before
choose_council("High")
[1] "chosen council is Highland"
But now we get a slightly more readable error message
choose_council(1)
Error in choose_council(1): council must be character
checks with {checkmate}
See phsmethods, where we will attempt to carry out the same checks, but with less effort.
What is a CHI number? The Community Health Index number is used in Scotland to uniquely identify patients.
What needs to be checked?
- Does it contain no non-numeric characters?
- Is it ten digits in length?
- Do the first six digits denote a valid date?
- Is the checksum digit correct?
We can deal with the first three quite quickly with the {checkmate} package
checkmate
“Virtually every standard type of user error when passing arguments into function can be caught with a simple, readable line which produces an informative error message.
A substantial part of the package was written in C to minimize any worries about execution time overhead.”
example CHI
x <- "0101011237"
is this a character vector?
check_class(x, "character")
checkClass(x, "character")
[1] TRUE
[1] TRUE
check_class
and checkClass
are exactly the same, simply choose whether you prefer snake_case
or camelCase
Functions beginning with check
return either TRUE
, (as above) or, the error message
check_class(x, "integer")
[1] "Must inherit from class 'integer', but has class 'character'"
Functions beginning with assert
either return an error message, or the checked object is returned invisibly:
assert_class(x, "integer")
Error in eval(expr, envir, enclos): Assertion on 'x' failed: Must inherit from class 'integer', but has class 'character'.
assert_class(x, "character")
Going back to the CHI example, we can use check_character
for a more fine grained series of checks
check_character(x, n.chars = 10, pattern = "\\d{10}") # 10 chars, numeric only
[1] TRUE
x2 <- "010101123A"
x3 <- c(x, x2, NA)
x4 <- c(x, NA)
check_character(x2, n.chars = 10, pattern = "[^A-Z]{10}")
check_character(x, n.chars = 10, pattern = "[^A-Z]{10}")
[1] "Must comply to pattern '[^A-Z]{10}'"
[1] TRUE
# final version
check_character(x,
min.len = 1,
n.chars = 10,
any.missing = FALSE,
pattern = "\\d{10}")
vals <- c(x, x2, x3, x4)
cat(vals)
purrr::map_chr(vals,
check_character,
min.len = 1,
n.chars = 10,
any.missing = FALSE,
pattern = "\\d{10}")
vals <- c(x, x2, x3, x4)
cat(vals)
0101011237 010101123A 0101011237 010101123A NA 0101011237 NA
purrr::map_chr(vals,
check_character,
min.len = 1,
n.chars = 10,
any.missing = FALSE,
pattern = "\\d{10}")
[1] "TRUE" "Must comply to pattern '\\d{10}'"
[3] "TRUE" "Must comply to pattern '\\d{10}'"
[5] "Contains missing values (element 1)" "TRUE"
[7] "Contains missing values (element 1)"
# are first 6 elements a Date?
date_val <- substr(x,1,6)
cat(date_val)
checkDate(as.Date(strptime(date_val,"%d%m%y", "UTC")),
lower = "1900-01-01",
upper = Sys.Date(),
any.missing = FALSE,
min.len = 1L)
010101
[1] TRUE
combine checks with the assert
function
main_check <- function(x){
assert(check_character(x,
min.len = 1,
n.chars = 10,
any.missing = FALSE,
pattern = "\\d{10}"),
checkDate(as.Date(strptime(substr(x,1,6),"%d%m%y", "UTC")),
lower = "1900-01-01",
upper = Sys.Date(),
any.missing = FALSE,
min.len = 1L),
combine = "and")
}
out <- main_check(x)
out
[1] TRUE
shorthand checking for the lazy / efficient programmer
qassert
built in data typesqassertr
lists and data frames
qassert(x,"S+[10,11)") # character, vector length 1, lower bound 10 and less than 11
qassert(x,"S+[10,10]") # also works, between 10 and 10 (inclusive)
# note difference in closing brackets
# character denoted by `s`
# no missing values denoted by UPPER CASE
# exact length of string 10 denoted by [10]
testing
we can use {tinytest}
for some checks also
tinytest::expect_inherits(x, "character")
----- PASSED : <-->
call| tinytest::expect_inherits(x, "character")
Normally we’d list some expectations Here’s a useless function that adds 2 to a given numerical value
add_two <- function(x) {
if (is.character(x)) {
stop("You've passed a character vector.\nGonnae no' dae that? \nIt should be an integer or double")
}
checkmate::assert_count(x)
checkmate::assert_integerish(x)
!checkmate::anyMissing(x)
x <- x + 2
message("ya wee beauty!")
return(x)
}
using("checkmate")
# test add_two works
expect_equal(1 + 2, add_two(1))
ya wee beauty!
----- PASSED : <-->
call| expect_equal(1 + 2, add_two(1))
add_two("one")
Error in add_two("one"): You've passed a character vector.
Gonnae no' dae that?
It should be an integer or double
expect_error(add_two("one"))
----- PASSED : <-->
call| expect_error(add_two("one"))
Further steps
This has been a very quick introduction to the subject.
Writing tests and checks is boring, but it’s only a matter of time until you get bitten.
{checkmate} takes a lot of the pain out of the process, and {tinytest} makes carrying out these automated checks an extremely quick, painless process.
Check out my package {juncture} (a WIP re-write of {patientcounter}) that uses both packages to further reduce external dependencies.
You may also want to read and review defensive programming, covered in this excellent text by Gillespie and Lovelace, and investigate the use of purrr’s possibly
and safely
functions