Publications by David Lindelöf

Machine Learning in R: Start with an End-to-End Test

13.11.2019

As a data scientist, you will likely be asked one day to automate your analysis and port your models to production environments. When that happens you cross the blurry line between data science and software engineering, and become a machine learning engineer. I’d like to share a few tips on how to make that transition as successful as possible....

9248 sym R (5356 sym/19 pcs)

Monty Hall: a programmer’s explanation

02.10.2020

I take it we’re all familiar with the infamous Monty Hall problem: Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say A, and the host, who knows what’s behind the doors, opens another door, say C, which has a goat. He then says to you, “Do y...

6932 sym R (746 sym/10 pcs) 2 img

A/B testing my resume

24.11.2020

Internet wisdom is divided on whether one-page resumes are more effective at landing you an interview than two-page ones. Most of the advice out there seems much opinion- or anecdotal-based, with very little scientific basis. Well, let’s fix that. Being currently open to work, I thought this would be the right time to test this scientifically. ...

4652 sym R (1264 sym/6 pcs) 2 img

No, you have not controlled for confounders

10.02.2021

When observational data includes a treatment indicator and some possible confounders, it is very tempting to simply regress the outcome on all features (confounders and treatment alike), extract the coefficients associated with the treatment indicator, and proudly proclaim that “we have controlled for confounders and estimated the treatment eff...

6953 sym R (7481 sym/31 pcs) 2 img 1 tbl

Feature standardization considered harmful

11.06.2021

Many statistical learning algorithms perform better when the covariates are on similar scales. For example, it is common practice to standardize the features used by an artificial neural network so that the gradient of its objective function doesn’t depend on the physical units in which the features are described. The same advice is frequently ...

2133 sym R (438 sym/4 pcs) 8 img