Publications by Econometrics and Free Software
R or Python? Why not both? Using Anaconda Python within R with {reticulate}
This short blog post illustrates how easy it is to use R and Python in the same R Notebook thanks to the {reticulate} package. For this to work, you might need to upgrade RStudio to the current preview version. Let’s start by importing {reticulate}: library(reticulate) {reticulate} is an RStudio package that provides “a comprehensive set of t...
3321 sym R (4459 sym/12 pcs) 6 img
Looking into 19th century ads from a Luxembourguish newspaper with R
The national library of Luxembourg published some very interesting data sets; scans of historical newspapers! There are several data sets that you can download, from 250mb up to 257gb. I decided to take a look at the 32gb “ML Starter Pack”. It contains high quality scans of one year of the L’indépendence Luxembourgeoise (Luxembourguish ind...
9086 sym R (5688 sym/9 pcs) 16 img
Making sense of the METS and ALTO XML standards
Last week I wrote a blog post where I analyzed one year of newspapers ads from 19th century newspapers. The data is made available by the national library of Luxembourg. In this blog post, which is part 1 of a 2 part series, I extract data from the 257gb archive, which contains 10 years of publications of the L’Union, another 19th century Luxem...
5606 sym R (6055 sym/13 pcs) 12 img
Using Data Science to read 10 years of Luxembourguish newspapers from the 19th century
I have been playing around with historical newspaper data (see here and here). I have extracted the data from the largest archive available, as described in the previous blog post, and now created a shiny dashboard where it is possible to visualize the most common words per article, as well as read a summary of each article. The summary was made ...
1798 sym 4 img
Building a shiny app to explore historical newspapers: a step-by-step guide
Introduction I started off this year by exploring a world that was unknown to me, the world of historical newspapers. I did not know that historical newspapers data was a thing, and have been thoroughly enjoying myself exploring the different datasets published by the National Library of Luxembourg. You can find the data here. In my first blog po...
9029 sym R (14593 sym/9 pcs) 12 img
Manipulating strings with the {stringr} package
This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for free here. This is taken from Chapter 4, in which I introduce the {stringr} package. Manipulate strings with {stringr} {stringr} contains functions to manipulate strings. In Chapter 10, I will teach you about regular expressions, but the functions containe...
11329 sym R (18328 sym/45 pcs) 4 img
Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 1
Can I get enough of historical newspapers data? Seems like I don’t. I already wrote four (1, 2, 3 and 4) blog posts, but there’s still a lot to explore. This blog post uses a new batch of data announced on twitter: For all who love to analyse text, the BnL released half a million of processed newspaper articles. Historical news from 1841-1878...
7519 sym R (8654 sym/13 pcs) 4 img
Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit
Can I get enough of historical newspapers data? Seems like I don’t. I already wrote four (1, 2, 3 and 4) blog posts, but there’s still a lot to explore. This blog post uses a new batch of data announced on twitter: For all who love to analyse text, the BnL released half a million of processed newspaper articles. Historical news from 1841-1878...
7519 sym R (8618 sym/13 pcs) 4 img
Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 2
In part 1 of this series I set up Vowpal Wabbit to classify newspapers content. Now, let’s use the model to make predictions and see how and if we can improve the model. Then, let’s train the model on the whole data. Step 1: prepare the data The first step consists in importing the test data and preparing it. The test data need not be large ...
3448 sym R (8052 sym/13 pcs) 4 img
Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 2
In part 1 of this series I set up Vowpal Wabbit to classify newspapers content. Now, let’s use the model to make predictions and see how and if we can improve the model. Then, let’s train the model on the whole data. Step 1: prepare the data The first step consists in importing the test data and preparing it. The test data need not be large ...
3448 sym R (8052 sym/13 pcs) 4 img