Publications by Econometrics and Free Software

{disk.frame} is epic

02.09.2019

Note: When I started writing this blog post, I encountered a bug and filed a bug report that I encourage you to read. The responsiveness of the developer was exemplary. Not only did Zhuo solve the issue in record time, he provided ample code snippets to illustrate the solutions. Hats off to him! This blog post is a short presentation of {disk.fra...

4969 sym R (1124 sym/7 pcs) 6 img

{disk.frame} is epic

02.09.2019

Note: When I started writing this blog post, I encountered a bug and filed a bug report that I encourage you to read. The responsiveness of the developer was exemplary. Not only did Zhuo solve the issue in record time, he provided ample code snippets to illustrate the solutions. Hats off to him! This blog post is a short presentation of {disk.fra...

4969 sym R (1124 sym/7 pcs) 6 img

Split-apply-combine for Maximum Likelihood Estimation of a linear model

04.10.2019

Intro Maximum likelihood estimation is a very useful technique to fit a model to data used a lot in econometrics and other sciences, but seems, at least to my knowledge, to not be so well known by machine learning practitioners (but I may be wrong about that). Other useful techniques to confront models to data used in econometrics are the minimum...

6177 sym R (1856 sym/5 pcs) 8 img

Split-apply-combine for Maximum Likelihood Estimation of a linear model

04.10.2019

Intro Maximum likelihood estimation is a very useful technique to fit a model to data used a lot in econometrics and other sciences, but seems, at least to my knowledge, to not be so well known by machine learning practitioners (but I may be wrong about that). Other useful techniques to confront models to data used in econometrics are the minimum...

6177 sym R (1856 sym/5 pcs) 8 img

Cluster multiple time series using K-means

12.10.2019

I have been recently confronted to the issue of finding similarities among time-series and though about using k-means to cluster them. To illustrate the method, I’ll be using data from the Penn World Tables, readily available in R (inside the {pwt9} package): library(tidyverse) library(lubridate) library(pwt9) library(brotools) First, of all, l...

2335 sym R (4627 sym/10 pcs) 8 img

Cluster multiple time series using K-means

12.10.2019

I have been recently confronted to the issue of finding similarities among time-series and though about using k-means to cluster them. To illustrate the method, I’ll be using data from the Penn World Tables, readily available in R (inside the {pwt9} package): library(tidyverse) library(lubridate) library(pwt9) library(brotools) First, of all, l...

2335 sym R (4627 sym/10 pcs) 8 img

Multiple data imputation and explainability

01.11.2019

Introduction Imputing missing values is quite an important task, but in my experience, very often, it is performed using very simplistic approaches. The basic approach is to impute missing values for numerical features using the average of each feature, or using the mode for categorical features. There are better ways of imputing missing values, ...

12555 sym R (30783 sym/30 pcs) 18 img

Multiple data imputation and explainability

01.11.2019

Introduction Imputing missing values is quite an important task, but in my experience, very often, it is performed using very simplistic approaches. The basic approach is to impute missing values for numerical features using the average of each feature, or using the mode for categorical features. There are better ways of imputing missing values, ...

12555 sym R (30783 sym/30 pcs) 18 img

Intrumental variable regression and machine learning

08.11.2019

Intro Just like the question “what’s the difference between machine learning and statistics” has shed a lot of ink (since at least Breiman (2001)), the same question but where statistics is replaced by econometrics has led to a lot of discussion, as well. I like this presentation by Hal Varian from almost 6 years ago. There’s a slide call...

14591 sym R (13242 sym/23 pcs) 14 img

Intrumental variable regression and machine learning

08.11.2019

Intro Just like the question “what’s the difference between machine learning and statistics” has shed a lot of ink (since at least Breiman (2001)), the same question but where statistics is replaced by econometrics has led to a lot of discussion, as well. I like this presentation by Hal Varian from almost 6 years ago. There’s a slide call...

14591 sym R (13242 sym/23 pcs) 14 img