Publications by Econometrics and Free Software

Data frame columns as arguments to dplyr functions

17.07.2016

Suppose that you would like to create a function which does a series of computations on a data frame. You would like to pass a column as this function’s argument. Something like: data(cars) convertToKmh <- function(dataset, col_name){ dataset$col_name <- dataset$speed * 1.609344 return(dataset) } This example is obviously not very interesti...

3388 sym R (2838 sym/12 pcs)

Read a lot of datasets at once with R

25.07.2016

I often have to read a lot of datasets at once using R. So I’ve wrote the following function to solve this issue: read_list <- function(list_of_datasets, read_func){ read_and_assign <- function(dataset, read_func){ dataset_name <- as.name(dataset) dataset_name <- read_func(dataset) } # i...

1245 sym R (2386 sym/4 pcs)

Merge a list of datasets together

29.07.2016

Last week I showed how to read a lot of datasets at once with R, and this week I’ll continue from there and show a very simple function that uses this list of read datasets and merges them all together. First we’ll use read_list() to read all the datasets at once (for more details read last week’s post): library("readr") library("tibble") ...

1914 sym R (2113 sym/6 pcs)

I’ve started writing a ‘book’: Functional programming and unit testing for data munging with R

03.11.2016

I have started writing a ‘book’ using the awesome bookdown package. In the book I explain and show why using functional programming and putting your functions in your own packages is the way to go when you want to clean, prepare and transform large data sets. It makes testing and documenting your code easier. You don’t need to think about m...

1186 sym

Work on lists of datasets instead of individual datasets by using functional programming

20.12.2016

Analyzing a lot of datasets can be tedious. In my work, I often have to compute descriptive statistics, or plot some graphs for some variables for a lot of datasets. The variables in question have the same name accross the datasets but are measured for different years. As an example, imagine you have this situation: data2000 <- mtcars data2001 <-...

2924 sym R (7590 sym/13 pcs)

Functional programming and unit testing for data munging with R available on Leanpub

23.12.2016

The book I’ve been working on these pasts months (you can read about it here, and read it for free here) is now available on Leanpub! You can grab a copy and read it on your ebook reader or on your computer, and what’s even better is that it is available for free (but you can also decide to buy it if you really like it). Here is the link on L...

1728 sym

My free book has a cover!

23.12.2016

I’m currently writing a book as a hobby. It’s titled Functional programming and unit testing for data munging with R and you can get it for free here. You can also read it online for free on my webpage What’s the book about? Here’s the teaser text: Learn the basics of functional programming, unit testing and package development for the R...

1358 sym 2 img

How to use jailbreakr

17.02.2017

What is jailbreakr The jailbreakr package is probably one of the most interesting packages I came across recently. This package makes it possible to extract messy data from spreadsheets. What is meant by messy? I am sure you already had to deal with spreadsheets that contained little tables inside a single sheet for example. As far as I know, the...

4202 sym R (9832 sym/9 pcs)

How to use jailbreakr

17.02.2017

What is jailbreakr The jailbreakr package is probably one of the most interesting packages I came across recently. This package makes it possible to extract messy data from spreadsheets. What is meant by messy? I am sure you already had to deal with spreadsheets that contained little tables inside a single sheet for example. As far as I know, the...

4204 sym R (9833 sym/9 pcs)

Lesser known dplyr tricks

08.03.2017

In this blog post I share some lesser-known (at least I believe they are) tricks that use mainly functions from dplyr. Removing unneeded columns Did you know that you can use - in front of a column name to remove it from a data frame? mtcars %>% select(-disp) %>% head() ## mpg cyl hp drat wt qsec vs am gear car...

2097 sym R (8435 sym/10 pcs)