Publications by arthur charpentier

Computational Time of Predictive Models

25.09.2015

Tuesday, at the end of my 5-hour crash course on machine learning for actuaries, Pierre asked me an interesting question about computational time of different techniques. I’ve been presenting the philosophy of various algorithm, but I forgot to mention computational time. I wanted to try several classification algorithms on the dataset used to ...

4151 sym R (6386 sym/28 pcs) 8 img

Playing with Leaflet (and Radar locations)

30.09.2015

Yesterday, my friend Fleur did show me some interesting features of the leaflet package, in R. library(leaflet) In order to illustrate, consider locations of (fixed) radars, in several European countries. To get the data, use download.file("http://carte-gps-gratuite.fr/radars/zones-de-danger-destinator.zip","radar.zip") unzip("radar.zip") ext_r...

1539 sym R (1098 sym/5 pcs) 6 img

Visualising a Circular Density

07.10.2015

This afternoon, Jean-Luc asked me some help about an old post I did publish, minuit, l’heure du crime; and some graphs published a few days after, where I used a different visualisation, in another post. The idea is that the hour can be seen as circular, in the sense that 23:58 is actually very close to 00:03. So when we use a nonparametric ker...

1969 sym R (1001 sym/3 pcs) 8 img

Tests, Power and Significance

14.10.2015

In the mathematical statistics course today, we started talking about tests, and decision rules. To illustrate all the concepts introduced today, we considered the case where we have a sample  with . And we want to test   against  In the course, we’ve seen that we could use a test based on the order statistics .  The test would be i.e. ...

2047 sym R (506 sym/5 pcs) 60 img

Statistical Tests: Asymptotic, Exact, ou based on Simulations?

20.10.2015

This morning, in our mathematical statistics course, we’ve been discussing the ‘proportion test‘, i.e. given a sample of Bernoulli trials, with , we want to test against  A natural test (which can be related to the maximum likelihood ratio test) is  based on the statistic The test function is here To get the bounds of the acceptance re...

1545 sym R (868 sym/5 pcs) 40 img

Applications of Chi-Square Tests

03.11.2015

This morning, in our mathematical statistical class, we’ve seen the use of the chi-square test. The first one was related to some goodness of fit of a multinomial distribution. Assume that . In order to test  against , use the statistic Under , . For instance, we have the number of weddings, in a large city, per season, > n=c(301,356,413,...

1929 sym R (1377 sym/14 pcs) 48 img

Variable Importance with Correlated Features

06.11.2015

Variable importance graphs are great tool to see, in a model, which variables are interesting. Since we usually use it with random forests, it looks like it is works well with (very) large datasets. The problem with large datasets is that a lot of features are ‘correlated’, and in that case, interpretation of the values of variable importanc...

2732 sym R (1072 sym/3 pcs) 48 img

Profile Likelihood

16.11.2015

Consider some simulated data > set.seed(1) > x=exp(rnorm(100)) Assume that those data are observed i.id. random variables with distribution, with . The natural idea is to consider the maximum likelihood estimator For instance, consider some maximum likelihood estimator, > library(MASS) > (F=fitdistr(x,"gamma")) shape rate 1.42144...

1727 sym R (1182 sym/6 pcs) 30 img

Additional thoughts about ‘Lorenz curves’ to compare models

28.11.2015

A few month ago, I did mention a graph, of some so-called Lorenz curves to compare regression models, see e.g. Progressive’s slides (thanks Guillaume for the reference) The idea is simple. Consider some model for the pure premium (in insurance, it is the quantity that we like to model), i.e. the conditional expected valeur On some dataset, we...

3700 sym R (3232 sym/12 pcs) 44 img

Inter-relationships in a matrix

01.12.2015

Last week, I wanted to displaying inter-relationships between data in a matrix. My friend Fleur, from AXA, mentioned an interesting possible application, in car accidents. In car against car accidents, it might be interesting to see which parts of the cars were involved. On https://www.data.gouv.fr/fr/, we can find such a dataset, with a lot o...

2114 sym R (825 sym/8 pcs) 6 img