Publications by Econometrics and Free Software
Maps with pie charts on top of each administrative division: an example with Luxembourg’s elections data
Abstract You can find the data used in this blog post here: https://github.com/b-rodrigues/elections_lux This is a follow up to a previous blog post where I extracted data of the 2018 Luxembourguish elections from Excel Workbooks. Now that I have the data, I will create a map of Luxembourg by commune, with pie charts of the results on top of each...
5686 sym R (12639 sym/30 pcs) 24 img
From webscraping data to releasing it as an R package to share with the world: a full tutorial with data from NetHack
If someone told me a decade ago (back before I'd ever heard the term “roguelike”) what I'd be doing today, I would have trouble believing this…Yet here we are. pic.twitter.com/N6Hh6A4tWl— Josh Ge (@GridSageGames) June 21, 2018 Abstract In this post, I am going to show you how you can scrape tables from a website, and then create a packag...
11972 sym R (6862 sym/34 pcs) 18 img
Analyzing NetHack data, part 1: What kills the players
Abstract In this post, I will analyse the data I scraped and put into an R package, which I called {nethack}. NetHack is a roguelike game; for more context, read my previous blog post. You can install the {nethack} package and play around with the data yourself by installing it from github: devtools::install_github("b-rodrigues/nethack") And to ...
7848 sym R (47358 sym/36 pcs) 28 img
Analyzing NetHack data, part 2: What players kill the most
Link to webscraping the data Link to Analysis, part 1 Introduction This is the third blog post that deals with data from the game NetHack, and oh boy, did a lot of things happen since the last blog post! Here’s a short timeline of the events: I scraped data from alt.org/nethack and made a package with the data available on Github (that packag...
9075 sym R (27598 sym/25 pcs) 20 img
Easy time-series prediction with R: a tutorial with air traffic data from Lux Airport
In this blog post, I will show you how you can quickly and easily forecast a univariate time series. I am going to use data from the EU Open Data Portal on air passenger transport. You can find the data here. I downloaded the data in the TSV format for Luxembourg Airport, but you could repeat the analysis for any airport. Once you have the data, ...
5139 sym R (13316 sym/20 pcs) 10 img
Searching for the optimal hyper-parameters of an ARIMA model in parallel: the tidy gridsearch approach
Introduction In this blog post, I’ll use the data that I cleaned in a previous blog post, which you can download here. If you want to follow along, download the monthly data. In the previous blog post, I used the auto.arima() function to very quickly get a “good-enough” model to predict future monthly total passengers flying from LuxAirport...
6690 sym R (7700 sym/25 pcs) 4 img
Using a genetic algorithm for the hyperparameter optimization of a SARIMA model
Introduction In this blog post, I’ll use the data that I cleaned in a previous blog post, which you can download here. If you want to follow along, download the monthly data. In my last blog post I showed how to perform a grid search the “tidy” way. As an example, I looked for the right hyperparameters of a SARIMA model. However, the goal o...
5737 sym R (14296 sym/23 pcs) 10 img
The best way to visit Luxembourguish castles is doing data science + combinatorial optimization
Inspired by David Schoch’s blog post, Traveling Beerdrinker Problem. Check out his blog, he has some amazing posts! Introduction Luxembourg, as any proper European country, is full of castles. According to Wikipedia, “By some optimistic estimates, there are as many as 130 castles in Luxembourg but more realistically there are probably just o...
6040 sym R (44095 sym/27 pcs) 8 img
A tutorial on tidy cross-validation with R
Introduction This blog posts will use several packages from the {tidymodels} collection of packages, namely {recipes}, {rsample} and {parsnip} to train a random forest the tidy way. I will also use {mlrMBO} to tune the hyper-parameters of the random forest. Set up Let’s load the needed packages: library("tidyverse") library("tidymodels") libra...
8527 sym R (9903 sym/26 pcs) 4 img
What hyper-parameters are, and what to do with them; an illustration with ridge regression
This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for free here. This is taken from Chapter 7, which deals with statistical models. In the text below, I explain what hyper-parameters are, and as an example I run a ridge regression using the {glmnet} package. The book is still being written, so comments are mor...
4896 sym R (1828 sym/9 pcs) 4 img