Publications by arthur charpentier

More neurons in the hidden layer than predictive features in neural nets

25.10.2024

This week, we were talking about neural networks for the first time, and I was saying that, in many illustrations of neural networks, there was a layer with fewer neurons than predictive variables, but sometimes, it could make sense to have more neurons in the layer than predictive variables, To illustrate, consider a simple example with one sing...

3406 sym 24 img

The m=√p rule for random forests

19.10.2024

A couple of days ago, in our lab session, we discussed random forrests, and, since it was based on the example in ISLR, we had a quick discussion about the random choice of features, and the “\(m=\sqrt{p}\)” rule Interestingly, on that one, we can play a bit, and try all choices, and do it again, on a different train/test split, library(random...

1513 sym 6 img

Calculating an LOOCV MSE by hand

11.10.2024

Last week, we had an “mid-term” exam, for our introduction to statistical learning course. The question is simple: consider three points, \((x_i,y_i)\), here \(\{(0,2),(2,2),(3,1)\}\)Consider here some linear models, estimated using least square techniques, what would be the leave-one-out cross-validation MSE ? I like this exercise since we ...

1830 sym R (203 sym/1 pcs) 4 img

Some updates about the insurance datasets package (CASdataset)

11.07.2024

Ten years ago, Computational Actuarial Science with R was published. With Christophe Dutang, we created at the same time an R package, collecting datasets used in the book. It was mainly to give access to the datasets to reproduce the applications, since functions used in the different chapters were coming from other R packages. Then, we started ad...

1778 sym R (247 sym/1 pcs) 2 img

Discrimination by proxy (a real case study)

15.02.2024

Yesterday, with Laurence Barry, we posted a blog post “Who benefits from data sharing?” explaining why data sharing, in insurance, could end mutualization. Actually, it can also be bad in the context of discrimination. Consider here the same dataset, with claim occurence, in a real insurance portfolio, library(InsurFair) library(randomForest) ...

3845 sym 4 img

Tweedie regression, or Poisson-Gamma regressions ?

08.02.2024

Yesterday, I was chating with a young and enthousiastic actuary, who asked a nice (and classical) question: is it the same, or not to use a Tweedie regression, or two regressions (Poisson, and Gamma). For distributions, the two are equivalent, but when we have heterogeneity and explanatory variable, I really think that using all information, and ru...

5578 sym 8 img

Model selection, AIC and Tweedie regression

16.04.2023

Just some simple codes to illustrate some points we will discuss this week, for the last course on GLMs, before the final exam. We have mentioned that the Gamma distribution belongs to the exponential, so we can run a regression, and compute the associated AIC, > set.seed(123) > test.data = rgamma(n=2000, scale=1, shape=1) > m1 = glm( test.data...

2138 sym R (2028 sym/12 pcs) 10 img

Snow in Montréal (Canada)

29.01.2023

Winter started a bit more than one month ago… but we have already experienced many snow storms… there is still a lot snow in gardens and in the streets, I was wondering if it was that unusual, but apparently not. Compared with last year, it is (for the first months of winter, until the end of Januray), it +50%, but it is comparable with prev...

1303 sym R (1344 sym/3 pcs) 4 img

Interpretability and explainability of predictive models

26.08.2022

In 400 AD, in his Confessiones, Augustine wrote quid est ergo tempus? si nemo ex me quaerat, scio; si quaerenti explicare velim, nescio that can be translated as What then is time? If no one asks me, I know what it is. If I wish to explain it to him who asks, I do not know. To go a little further (because often, if we are asked to explain, we hav...

23108 sym R (4463 sym/18 pcs) 54 img

Monty Hall problem, with Thompson sampling

07.09.2022

We all know the Monty Hall problem. Recently, Jason Rosenhouse published a book on that topic (entitled The Monty Hall Problem, The Remarkable Story of Math’s Most Contentious Brain Teaser). The game is more or less described by the following question Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door...

4310 sym R (1505 sym/8 pcs) 6 img