Publications by arthur charpentier

A Million Random Digits: review of reviews


Recently on his blog (here), Robin mentioned an amazing book, called “A Million Random Digits” published by RAND corporation. The book was initially published in 1955, but RAND published a nice (and expensive) second edition. A great thing is that on Amazon, there are several extremely interesting reviews of the book. E.g. Didn’t like the...

4069 sym R (358 sym/1 pcs) 26 img

Playing with quantiles, part 1


A standard idea in extreme value theory (see e.g. here, in French unfortunately) is that to estimate the 99.5% quantile (say), we just need to estimate a quantile of level 95% for observations exceeding the 90% quantile. In extreme value theory, we assume that the 90% quantile (of the initial distribution) can be obtained easily, e.g. the empir...

2830 sym R (553 sym/1 pcs) 42 img

Playing with quantiles, part 2


It is common to look at best time at the Marathon. Or perhaps the distribution of the top100, as done by John Myles White on his blog here (data can be found there), as the graph below, with the density of the time for the first 100 men (in blue) and the first 100 women (in red). Hence, men and women are different and men run faster than women. ...

2693 sym R (944 sym/1 pcs) 10 img 1 tbl

Want to say one thing and the exact oppositive with strong confidence ?


No need to do politics. Just take a statistical course. And I do not talk about misinterpretation of statistics, but I talk about the mathematical foundations of statistical tests. Consider the following parametric test, with a one-dimensional parameter: versus , for some (fixed) . A standard way of doing such a test is to consid...

3893 sym R (3080 sym/6 pcs) 84 img

Circular or spherical data, and density estimation


I few years ago, while I was working on kernel based density estimation on compact support distribution (like copulas) I went through a series of papers on circular distributions. By that time, I thought it was something for mathematicians working on weird spaces…. but during the past weeks, I saw several potential applications ...

5164 sym R (2535 sym/5 pcs) 82 img 6 tbl

Time horizon in forecasting, and rules of thumb


I recently received an email about forecasting and rules of thumb. “Dans la profession […] se transmet une règle empirique qui voudrait que l’on prenne un historique du double de l’horizon de prévision : 20 ans de données pour une prévision à 10 ans, etc… Je souhaite savoir si cette règle n’aurait pas, par hasard, un fondement t...

3799 sym R (82 sym/1 pcs) 20 img 3 tbl

Oscar awards: good actor versus good actress


I am not a big fan of those ceremonies, where some actors pretend that they are extremely happy to be there, and then some win a trophy, some don’t, and those who win start to cry, and those who did not get a trophy try to pretend that they are not affected, etc. The other reason is that, since I have several kids, I do not go to see the movies...

2802 sym R (959 sym/3 pcs) 12 img

Playing with robots


My son would be extremely proud if I tell him I can spend hours building robots. Well, my robots are not as fancy as Dr Tenma’s, but they usually do what I ask them to do. For instance, it is extremely simple to build a robot with R, to extract data from websites. I have mentioned it here (one tennis matches), but it failed there (on NY Maratho...

2929 sym R (1106 sym/4 pcs) 6 img

Who will be the next President of the US ?


A lot of weird facts (?) can be found on the internet. For instance, about the height of the winner of Presidential elections in the US: the taller always win… “Still, being short does, on average, hurt a person’s prospects…The tall guy gets the girl. The taller presidential candidate almost always wins.” (here) “from 1900 to 1968 the...

2462 sym R (2150 sym/4 pcs) 6 img

Multivariate probit regression using (direct) maximum likelihood estimators


Consider a random pair of binary responses, i.e. with taking values 1 or 2. Assume that probability can be function of some covariates . The Gaussian vector latent structure A standard model is based a latent Gaussian structure, i.e. there exists some random vector such that if is lower than a given threshold, and 1 otherw...

1314 sym R (4672 sym/6 pcs) 22 img