Publications by arthur charpentier
Holt-Winters with a Quantile Loss Function
Exponential Smoothing is an old technique, but it can perform extremely well on real time series, as discussed in Hyndman, Koehler, Ord & Snyder (2008)), when Gardner (2005) appeared, many believed that exponential smoothing should be disregarded because it was either a special case of ARIMA modeling or an ad hoc procedure with no statistical ra...
2939 sym R (2157 sym/10 pcs) 6 img
Using convolutions (S3) vs distributions (S4)
Usually, to illustrate the difference between S3 and S4 classes in R, I mention glm (from base) and vglm (from VGAM) that provide similar outputs, but one is based on S3 codes, while the second one is based on S4 codes. Another way to illustrate is to manipulate distributions. Consider the case where we want to sum (independent) random variables....
2207 sym R (1121 sym/16 pcs) 8 img
When “learning Python” becomes “practicing R” (spoiler)
15 years ago, a student of mine told me that I should start learning Python, that it was really a great language. Students started to learn it, but I kept postponing. A few years ago, I started also Python for Kids, which is really nice actually, with my son. That was nice, but not really challenging. A few weeks ago, I also started a crash cours...
5264 sym R (3967 sym/29 pcs) 18 img
Graduate Course on Advanced Tools for Econometrics (2)
This Tuesday, I will be giving the second part of the (crash) graduate course on advanced tools for econometrics. It will take place in Rennes, IMAPP room, and I have been told that there will be a visio with Nantes and Angers. Slides for the morning are online, as well as slides for the afternoon. In the morning, we will talk about variable sec...
875 sym
Some sort of Otto Neurath (isotype picture) map
Yesterday evening, I was walking in Budapest, and I saw some nice map that was some sort of Otto Neurath style. It was hand-made but I thought it should be possible to do it in R, automatically. A few years ago, Baptiste Coulmont published a nice blog post on the package osmar, that can be used to import OpenStreetMap objects (polygons, lines, et...
2266 sym R (3532 sym/14 pcs) 8 img
Classification from scratch, overview 0/8
Before my course on « big data and economics » at the university of Barcelona in July, I wanted to upload a series of posts on classification techniques, to get an insight on machine learning tools. According to some common idea, machine learning algorithms are black boxes. I wanted to get back on that saying. First of all, isn’t it the cas...
2685 sym R (987 sym/4 pcs) 8 img
Classification from scratch, logistic regression 1/8
Let us start today our series on classification from scratch… The logistic regression is based on the assumption that given covariates \(\mathbf{x}\), \(Y\) has a Bernoulli distribution,\(Y|\mathbf{X}=\mathbf{x}\sim\mathcal{B}(p_{\mathbf{x}}),~~~~p_\mathbf{x}=\frac{\exp[\mathbf{x}^T\mathbf{\beta}]}{1+\exp[\mathbf{x}^T\mathbf{\beta}]}\)The goal ...
5756 sym R (6299 sym/18 pcs) 4 img
Classification from scratch, logistic with splines 2/8
Today, second post of our series on classification from scratch, following the brief introduction on the logistic regression. Piecewise linear splines To illustrate what’s going on, let us start with a “simple” regression (with only one explanatory variable). The underlying idea is natura non facit saltus, for “nature does not make jumps�...
7087 sym R (5470 sym/26 pcs) 30 img
Classification from scratch, trees 9/8
Nineth post of our series on classification from scratch. Today, we’ll see the heuristics of the algorithm inside classification trees. And yes, I promised eight posts in that series, but clearly, that was not sufficient… sorry for the poor prediction. Decision Tree Decision trees are easy to read. So easy to read that they are everywhere We...
9168 sym R (4687 sym/25 pcs) 30 img
Classification from scratch, logistic with kernels 3/8
Third post of our series on classification from scratch, following the previous post introducing smoothing techniques, with (b)-splines. Consider here kernel based techniques. Note that here, we do not use the “logistic” model… it is purely non-parametric. kernel based estimated, from scratch I like kernels because they are somehow very int...
7329 sym R (3235 sym/11 pcs) 14 img