Publications by arthur charpentier
Computational Time of Predictive Models
Tuesday, at the end of my 5-hour crash course on machine learning for actuaries, Pierre asked me an interesting question about computational time of different techniques. I’ve been presenting the philosophy of various algorithm, but I forgot to mention computational time. I wanted to try several classification algorithms on the dataset used to ...
4151 sym R (6386 sym/28 pcs) 8 img
Playing with Leaflet (and Radar locations)
Yesterday, my friend Fleur did show me some interesting features of the leaflet package, in R. library(leaflet) In order to illustrate, consider locations of (fixed) radars, in several European countries. To get the data, use download.file("http://carte-gps-gratuite.fr/radars/zones-de-danger-destinator.zip","radar.zip") unzip("radar.zip") ext_r...
1539 sym R (1098 sym/5 pcs) 6 img
Visualising a Circular Density
This afternoon, Jean-Luc asked me some help about an old post I did publish, minuit, l’heure du crime; and some graphs published a few days after, where I used a different visualisation, in another post. The idea is that the hour can be seen as circular, in the sense that 23:58 is actually very close to 00:03. So when we use a nonparametric ker...
1969 sym R (1001 sym/3 pcs) 8 img
Tests, Power and Significance
In the mathematical statistics course today, we started talking about tests, and decision rules. To illustrate all the concepts introduced today, we considered the case where we have a sample with . And we want to test against In the course, we’ve seen that we could use a test based on the order statistics . The test would be i.e. ...
2047 sym R (506 sym/5 pcs) 60 img
Statistical Tests: Asymptotic, Exact, ou based on Simulations?
This morning, in our mathematical statistics course, we’ve been discussing the ‘proportion test‘, i.e. given a sample of Bernoulli trials, with , we want to test against A natural test (which can be related to the maximum likelihood ratio test) is based on the statistic The test function is here To get the bounds of the acceptance re...
1545 sym R (868 sym/5 pcs) 40 img
Applications of Chi-Square Tests
This morning, in our mathematical statistical class, we’ve seen the use of the chi-square test. The first one was related to some goodness of fit of a multinomial distribution. Assume that . In order to test against , use the statistic Under , . For instance, we have the number of weddings, in a large city, per season, > n=c(301,356,413,...
1929 sym R (1377 sym/14 pcs) 48 img
Variable Importance with Correlated Features
Variable importance graphs are great tool to see, in a model, which variables are interesting. Since we usually use it with random forests, it looks like it is works well with (very) large datasets. The problem with large datasets is that a lot of features are ‘correlated’, and in that case, interpretation of the values of variable importanc...
2732 sym R (1072 sym/3 pcs) 48 img
Profile Likelihood
Consider some simulated data > set.seed(1) > x=exp(rnorm(100)) Assume that those data are observed i.id. random variables with distribution, with . The natural idea is to consider the maximum likelihood estimator For instance, consider some maximum likelihood estimator, > library(MASS) > (F=fitdistr(x,"gamma")) shape rate 1.42144...
1727 sym R (1182 sym/6 pcs) 30 img
Additional thoughts about ‘Lorenz curves’ to compare models
A few month ago, I did mention a graph, of some so-called Lorenz curves to compare regression models, see e.g. Progressive’s slides (thanks Guillaume for the reference) The idea is simple. Consider some model for the pure premium (in insurance, it is the quantity that we like to model), i.e. the conditional expected valeur On some dataset, we...
3700 sym R (3232 sym/12 pcs) 44 img
Inter-relationships in a matrix
Last week, I wanted to displaying inter-relationships between data in a matrix. My friend Fleur, from AXA, mentioned an interesting possible application, in car accidents. In car against car accidents, it might be interesting to see which parts of the cars were involved. On https://www.data.gouv.fr/fr/, we can find such a dataset, with a lot o...
2114 sym R (825 sym/8 pcs) 6 img