Publications by Econometrics and Free Software

Reproducible data science with Nix, part 6 — CI/CD has never been easier

19.09.2023

Warning: I highly recommend you read this blog post first, which will explain how to run a pipeline inside Nix in detail. This blog post will assume that you’ve read that one, and it would also help if you’re familiar with Github Actions, if not, read this other blog post of mine as well This is getting ridiculous. The meme that I’m using as ...

6976 sym Python (3196 sym/3 pcs) 4 img

Reproducible data science with Nix, part 4 — So long, {renv} and Docker, and thanks for all the fish

11.08.2023

For this blog post, I also made a youtube video that goes over roughly the same ideas, but the blog post is more detailed as I explain the contents of default.nix files, which I don’t do in the video. Watch the video here. This is the fourth post in a series of posts about Nix. Disclaimer: I’m a super beginner with Nix. So this series of blog p...

18770 sym R (6857 sym/26 pcs) 8 img

Reproducible data science with Nix, part 3 — frictionless {plumber} api deployments with Nix

29.07.2023

This is the third post in a series of posts about Nix. Disclaimer: I’m a super beginner with Nix. So this series of blog posts is more akin to notes that I’m taking while learning than a super detailed tutorial. So if you’re a Nix expert and read something stupid in here, that’s normal. This post is going to focus on R (obviously) but the i...

18709 sym R (2924 sym/22 pcs) 16 img

Reproducible data science with Nix, part 2 — running {targets} pipelines with Nix

18.07.2023

This is the second post in a series of posts about Nix. Disclaimer: I’m a super beginner with Nix. So this series of blog posts is more akin to notes that I’m taking while learning than a super detailed tutorial. So if you’re a Nix expert and read something stupid in here, that’s normal. This post is going to focus on R (obviously) but the ...

12888 sym Python (2009 sym/14 pcs) 4 img

Reproducible data science with Nix

12.07.2023

This is the first of a (hopefully) series of posts about Nix. Disclaimer: I’m a super beginner with Nix. So this series of blog posts is more akin to notes that I’m taking while learning than a super detailed tutorial. So if you’re a Nix expert and read something stupid in here, that’s normal. This post is going to focus on R (obviously) bu...

12713 sym Python (864 sym/7 pcs) 4 img

Automating checks of *handcrafted* Word tables with {docxtractr}

17.03.2023

Unfortunately not everyone knows about literate programming so many tables in Word documents are “generated” by hand (generated is really too strong a word) and what very often happens is that these handcrafted tables have typos. Usually it’s totals that are wrong. Checking the totals in these tables by hand with a pocket calculator is a tedi...

7176 sym R (16649 sym/19 pcs) 6 img

Software engineering techniques that non-programmers who write a lot of code can benefit from — the DRY WIT approach

06.03.2023

Data scientists, statisticians, analysts, researchers, and many other professionals write a lot of code. Not only do they write a lot of code, but they must also read and review a lot of code as well. They either work in teams and need to review each other’s code, or need to be able to reproduce results from past projects, be it for peer review o...

7195 sym Python (971 sym/4 pcs) 10 img

What I’ve learned making an .epub Ebook with Quarto

02.03.2023

I’ve been working on an ebook (that you can read over here) made using Quarto. Since I’m also selling a DRM-free Epub and PDF on Leanpub I wanted to share some tips and tricks I’ve learned to generate an Epub that passes epubcheck using Quarto. Quarto is a tool made by Posit and is an open-source scientific and technical publishing tool. If ...

7203 sym Python (3894 sym/15 pcs) 6 img

A Linux Live USB as a statistical programming dev environment

28.10.2022

This blog post is divided in two parts: in the first part I’ll show you how to create a Linux Live USB with persistent storage that can be used as development environment, and in the second part I’ll show you the easiest way to set up RStudio and R in Ubuntu. Making your own, portable, development environment based on Ubuntu or Debian I’m ...

6529 sym R (524 sym/4 pcs) 6 img

How to deal with annoying medium sized data inside a Shiny app

30.10.2022

This blog post is taken from a chapter of my ebook on building reproducible analytical pipelines, which you can read here If you want to follow along, you can start by downloading the data I use here. This is a smaller dataset made from the one you can get here. Uncompressed it’ll be a 2.4GB file. Not big data in any sense, but big enough to be...

7580 sym R (4397 sym/8 pcs) 2 img