Publications by Econometrics and Free Software
{disk.frame} is epic
Note: When I started writing this blog post, I encountered a bug and filed a bug report that I encourage you to read. The responsiveness of the developer was exemplary. Not only did Zhuo solve the issue in record time, he provided ample code snippets to illustrate the solutions. Hats off to him! This blog post is a short presentation of {disk.fra...
4969 sym R (1124 sym/7 pcs) 6 img
{disk.frame} is epic
Note: When I started writing this blog post, I encountered a bug and filed a bug report that I encourage you to read. The responsiveness of the developer was exemplary. Not only did Zhuo solve the issue in record time, he provided ample code snippets to illustrate the solutions. Hats off to him! This blog post is a short presentation of {disk.fra...
4969 sym R (1124 sym/7 pcs) 6 img
Split-apply-combine for Maximum Likelihood Estimation of a linear model
Intro Maximum likelihood estimation is a very useful technique to fit a model to data used a lot in econometrics and other sciences, but seems, at least to my knowledge, to not be so well known by machine learning practitioners (but I may be wrong about that). Other useful techniques to confront models to data used in econometrics are the minimum...
6177 sym R (1856 sym/5 pcs) 8 img
Split-apply-combine for Maximum Likelihood Estimation of a linear model
Intro Maximum likelihood estimation is a very useful technique to fit a model to data used a lot in econometrics and other sciences, but seems, at least to my knowledge, to not be so well known by machine learning practitioners (but I may be wrong about that). Other useful techniques to confront models to data used in econometrics are the minimum...
6177 sym R (1856 sym/5 pcs) 8 img
Cluster multiple time series using K-means
I have been recently confronted to the issue of finding similarities among time-series and though about using k-means to cluster them. To illustrate the method, I’ll be using data from the Penn World Tables, readily available in R (inside the {pwt9} package): library(tidyverse) library(lubridate) library(pwt9) library(brotools) First, of all, l...
2335 sym R (4627 sym/10 pcs) 8 img
Cluster multiple time series using K-means
I have been recently confronted to the issue of finding similarities among time-series and though about using k-means to cluster them. To illustrate the method, I’ll be using data from the Penn World Tables, readily available in R (inside the {pwt9} package): library(tidyverse) library(lubridate) library(pwt9) library(brotools) First, of all, l...
2335 sym R (4627 sym/10 pcs) 8 img
Multiple data imputation and explainability
Introduction Imputing missing values is quite an important task, but in my experience, very often, it is performed using very simplistic approaches. The basic approach is to impute missing values for numerical features using the average of each feature, or using the mode for categorical features. There are better ways of imputing missing values, ...
12555 sym R (30783 sym/30 pcs) 18 img
Multiple data imputation and explainability
Introduction Imputing missing values is quite an important task, but in my experience, very often, it is performed using very simplistic approaches. The basic approach is to impute missing values for numerical features using the average of each feature, or using the mode for categorical features. There are better ways of imputing missing values, ...
12555 sym R (30783 sym/30 pcs) 18 img
Intrumental variable regression and machine learning
Intro Just like the question “what’s the difference between machine learning and statistics” has shed a lot of ink (since at least Breiman (2001)), the same question but where statistics is replaced by econometrics has led to a lot of discussion, as well. I like this presentation by Hal Varian from almost 6 years ago. There’s a slide call...
14591 sym R (13242 sym/23 pcs) 14 img
Intrumental variable regression and machine learning
Intro Just like the question “what’s the difference between machine learning and statistics” has shed a lot of ink (since at least Breiman (2001)), the same question but where statistics is replaced by econometrics has led to a lot of discussion, as well. I like this presentation by Hal Varian from almost 6 years ago. There’s a slide call...
14591 sym R (13242 sym/23 pcs) 14 img