Publications by Econometrics and Free Software
R will always be arcane to those who do not make a serious effort to learn it…
R will always be arcane to those who do not make a serious effort to learn it. It is not meant to be intuitive and easy for casual users to just plunge into. It is far too complex and powerful for that. But the rewards are great for serious data analysts who put in the effort. — Berton Gunter R-help August 2007 I’ve posted this quote on twi...
5610 sym R (1298 sym/3 pcs) 4 img
What’s the fastest way to search and replace strings in a data frame?
I’ve tweeted this: Just changed like 100 grepl calls to stringi::stri_detect and my pipeline now runs 4 times faster #RStats — Bruno Rodrigues (@brodriguesco) July 20, 2022 much discussed ensued. Some people were surprised, because in their experience, grepl() was faster than alternatives, especially if you set the perl parameter in grepl(...
3443 sym R (4641 sym/4 pcs) 4 img
Capture errors, warnings and messages
In my last video I tried to add a feature to my {loud} package (more info here) and I succeeded. But in succeeding in realised that I would need to write a bit more code than what I expected. To make a long story short: it is possible to capture errors using purrr::safely(): library(purrr) safe_log <- safely(log) a <- safe_log("10") str(a) ## ...
3502 sym R (4420 sym/16 pcs) 4 img
Bootstrapping standard errors for difference-in-differences estimation with R
I’m currently working on a paper (with my colleague Vincent Vergnat who is also a Phd candidate at BETA) where I want to estimate the causal impact of the birth of a child on hourly and daily wages as well as yearly worked hours. For this we are using non-parametric difference-in-differences (henceforth DiD) and thus have to bootstrap the stand...
2120 sym R (830 sym/7 pcs) 2 img
Unit testing with R
I've been introduced to unit testing while working with colleagues on quite a big project for which we use Python. At first I was a bit skeptical about the need of writing unit tests, but now I must admit that I am seduced by the idea and by the huge time savings it allows. Naturally, I was wondering if the same could be achieved with R, and wa...
3951 sym R (727 sym/6 pcs)
Careful with tryCatch
tryCatch is one of the functions that allows the users to handle errors in a simple way. With it, you can do things like: if(error), then(do this). Take the following example: sqrt("a") Error in sqrt("a") : non-numeric argument to mathematical function Now maybe you’d want something to happen when such an error happens. You can achieve that wit...
3116 sym R (2464 sym/13 pcs)
Data frame columns as arguments to dplyr functions
Suppose that you would like to create a function which does a series of computations on a data frame. You would like to pass a column as this function’s argument. Something like: data(cars) convertToKmh <- function(dataset, col_name){ dataset$col_name <- dataset$speed * 1.609344 return(dataset) } This example is obviously not very interesti...
3388 sym R (2838 sym/12 pcs)
Read a lot of datasets at once with R
I often have to read a lot of datasets at once using R. So I’ve wrote the following function to solve this issue: read_list <- function(list_of_datasets, read_func){ read_and_assign <- function(dataset, read_func){ dataset_name <- as.name(dataset) dataset_name <- read_func(dataset) } # i...
1245 sym R (2386 sym/4 pcs)
Merge a list of datasets together
Last week I showed how to read a lot of datasets at once with R, and this week I’ll continue from there and show a very simple function that uses this list of read datasets and merges them all together. First we’ll use read_list() to read all the datasets at once (for more details read last week’s post): library("readr") library("tibble") ...
1914 sym R (2113 sym/6 pcs)
I’ve started writing a ‘book’: Functional programming and unit testing for data munging with R
I have started writing a ‘book’ using the awesome bookdown package. In the book I explain and show why using functional programming and putting your functions in your own packages is the way to go when you want to clean, prepare and transform large data sets. It makes testing and documenting your code easier. You don’t need to think about m...
1186 sym