Publications by Econometrics and Free Software
Reproducible data science with Nix, part 2 — running {targets} pipelines with Nix
This is the second post in a series of posts about Nix. Disclaimer: I’m a super beginner with Nix. So this series of blog posts is more akin to notes that I’m taking while learning than a super detailed tutorial. So if you’re a Nix expert and read something stupid in here, that’s normal. This post is going to focus on R (obviously) but the ...
12888 sym Python (2009 sym/14 pcs) 4 img
Reproducible data science with Nix
This is the first of a (hopefully) series of posts about Nix. Disclaimer: I’m a super beginner with Nix. So this series of blog posts is more akin to notes that I’m taking while learning than a super detailed tutorial. So if you’re a Nix expert and read something stupid in here, that’s normal. This post is going to focus on R (obviously) bu...
12713 sym Python (864 sym/7 pcs) 4 img
Automating checks of *handcrafted* Word tables with {docxtractr}
Unfortunately not everyone knows about literate programming so many tables in Word documents are “generated” by hand (generated is really too strong a word) and what very often happens is that these handcrafted tables have typos. Usually it’s totals that are wrong. Checking the totals in these tables by hand with a pocket calculator is a tedi...
7176 sym R (16649 sym/19 pcs) 6 img
Software engineering techniques that non-programmers who write a lot of code can benefit from — the DRY WIT approach
Data scientists, statisticians, analysts, researchers, and many other professionals write a lot of code. Not only do they write a lot of code, but they must also read and review a lot of code as well. They either work in teams and need to review each other’s code, or need to be able to reproduce results from past projects, be it for peer review o...
7195 sym Python (971 sym/4 pcs) 10 img
What I’ve learned making an .epub Ebook with Quarto
I’ve been working on an ebook (that you can read over here) made using Quarto. Since I’m also selling a DRM-free Epub and PDF on Leanpub I wanted to share some tips and tricks I’ve learned to generate an Epub that passes epubcheck using Quarto. Quarto is a tool made by Posit and is an open-source scientific and technical publishing tool. If ...
7203 sym Python (3894 sym/15 pcs) 6 img
A Linux Live USB as a statistical programming dev environment
This blog post is divided in two parts: in the first part I’ll show you how to create a Linux Live USB with persistent storage that can be used as development environment, and in the second part I’ll show you the easiest way to set up RStudio and R in Ubuntu. Making your own, portable, development environment based on Ubuntu or Debian I’m ...
6529 sym R (524 sym/4 pcs) 6 img
How to deal with annoying medium sized data inside a Shiny app
This blog post is taken from a chapter of my ebook on building reproducible analytical pipelines, which you can read here If you want to follow along, you can start by downloading the data I use here. This is a smaller dataset made from the one you can get here. Uncompressed it’ll be a 2.4GB file. Not big data in any sense, but big enough to be...
7580 sym R (4397 sym/8 pcs) 2 img
R, its license and my take on it
Foreword: This is not a tutorial nor anything like that. I’m going to talk about free software, open source, and their licenses. I’m going to give my (non-)expert opinion on it. You may find, after having finished reading this post, that I wasted your time. So only read if by some miracle the first sentence of the foreword excited you. If not...
5989 sym 6 img
Functional programming explains why containerization is needed for reproducibility
I’ve had some discussions online and in the real world about this blog post and I’d like to restate why containerization is needed for reproducibility, and do so from the lens of functional programming. When setting up a pipeline, wether you’re a functional programming enthusiast or not, you’re aiming at setting it up in a way that this p...
6155 sym Python (2308 sym/14 pcs) 4 img
Code longevity of the R programming language
I’ve been working on a way to evaluate how old R code runs on the current version of R, and am now ready to share some results. It all started with this tweet: The problem is that you have to find old code laying around. Some people have found old code they wrote a decade or more ago and tried to rerun it; there’s this blog post by Thomas Lu...
8959 sym R (2336 sym/6 pcs) 10 img