Publications by arthur charpentier
What it the interpretation of the diagonal for a ROC curve
Last Friday, we discussed the use of ROC curves to describe the goodness of a classifier. I did say that I will post a brief paragraph on the interpretation of the diagonal. If you look around some say that it describes the “strategy of randomly guessing a class“, that it is obtained with “a diagnostic test that is no better than chance lev...
3354 sym R (3504 sym/18 pcs) 42 img
Estimates on training vs. validation samples
Before moving to cross-validation, it was natural to say “I will burn 50% (say) of my data to train a model, and then use the remaining to fit the model”. For instance, we can use training data for variable selection (e.g. using some stepwise procedure in a logistic regression), and then, once variable have been selected, fit the model on the...
2569 sym R (1113 sym/4 pcs) 32 img
Pareto Models for Top Incomes
With Emmanuel Flachaire, we uploaded on hal a paper on Pareto Models for Top Incomes, Top incomes are often related to Pareto distribution. To date, economists have mostly used Pareto Type I distribution to model the upper tail of income and wealth distribution. It is a parametric distribution, with an attractive property, that can be easily lin...
1644 sym
On my way to Manizales (Colombia)
Next week, I will be in Manizales, Colombia, for the Third International Congress on Actuarial Science and Quantitative Finance. I will be giving a lecture on Wednesday with Jed Fress and Emilianos Valdez. I will give my course on Algorithms for Predictive Modeling on Thursday morning (after Jed and Emil’s lectures). Unfortunately, my computer...
1008 sym 4 img
Optimal transport on large networks
With Alfred Galichon and Lucas Vernet, we recently uploaded a paper entitled optimal transport on large networks on arxiv. This article presents a set of tools for the modeling of a spatial allocation problem in a large geographic market and gives examples of applications. In our settings, the market is described by a network that maps the cost o...
1921 sym 4 img
Insurance data science : use and value of unusual data #1
Next week, with , I will be at the Summer School of the Swiss Association of Actuaries, in Lausanne, with Jean-Philippe Boucher (UQAM) and Ewen Gallic (AMSE). I will give an introductionary talk on Monday morning, and the slides are now available There will be some hands-on applications, on R. I will share some codes in the slides. Related To...
746 sym 2 img
Insurance data science : Pictures
At the Summer School of the Swiss Association of Actuaries, in Lausanne, following the part of Jean-Philippe Boucher (UQAM) on telematic data, I will start talking about pictures this Wednesday. Slides are available online Ewen Gallic (AMSE) will present a tutorial on satellite pictures, and a simple classification problem, related to Alzeimher ...
990 sym 18 img
Insurance data science : Text
At the Summer School of the Swiss Association of Actuaries, in Lausanne, I will start talking about text based data and NLP this Thursday. Slides are available online Ewen Gallic (AMSE) will present a tutorial on tweets. I can upload a few additional slides on LSTM (recurrent neural nets) Related To leave a comment for the author, please follo...
703 sym 4 img
Insurance data science : Networks
At the Summer School of the Swiss Association of Actuaries, in Lausanne, I will start talking about networks and insurance this Friday. Slides are available online Related To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics. R-bloggers.com offers daily e-mail updates about R news ...
576 sym 2 img
On leverage
Last week, in our STT5100 (applied linear models) class, I’ve introduce the hat matrix, and the notion of leverage. In a classical regression model, \(\boldsymbol{y}=\boldsymbol{X}\boldsymbol{\beta}\) (in a matrix form), the ordinary least square estimator of parameter \(\boldsymbol{\beta}\) is \(\widehat{\boldsymbol{\beta}}=(\boldsymbol{X}^\to...
6958 sym R (412 sym/4 pcs) 10 img