Publications by arthur charpentier

An Update on Boosting with Splines

02.07.2015

In my previous post, An Attempt to Understand Boosting Algorithm(s), I was puzzled by the boosting convergence when I was using some spline functions (more specifically linear by parts and continuous regression functions). I was using > library(splines) > fit=lm(y~bs(x,degree=1,df=3),data=df) The problem with that spline function is that knots s...

1514 sym R (1073 sym/4 pcs) 18 img

Choosing a Classifier

21.07.2015

In order to illustrate the problem of chosing a classification model consider some simulated data, > n = 500 > set.seed(1) > X = rnorm(n) > ma = 10-(X+1.5)^2*2 > mb = -10+(X-1.5)^2*2 > M = cbind(ma,mb) > set.seed(1) > Z = sample(1:2,size=n,replace=TRUE) > Y = ma*(Z==1)+mb*(Z==2)+rnorm(n)*5 > df = data.frame(Z=as.factor(Z),X,Y) A first strategy is...

3055 sym R (3671 sym/23 pcs) 40 img

Visualising Claims Frequency

28.07.2015

A few years ago, I did publish a post to visualize and empirical claims frequency in a portfolio. I wanted to update the code. Here is a code to get a dataset, sinistre <- read.table("http://freakonometrics.free.fr/sinistreACT2040.txt",header=TRUE,sep=";") sinistres=sinistre[sinistre$garantie=="1RC",] contrat <- read.table("http://freakonomet...

807 sym R (3000 sym/4 pcs) 4 img

Modelling Occurence of Events, with some Exposure

28.07.2015

This afternoon, an interesting point was raised, and I wanted to get back on it (since I did publish a post on that same topic a long time ago). How can we adapt a logistic regression when all the observations do not have the same exposure. Here the model is the following: , the occurence of an event  on the period  is unobserved the occuren...

4517 sym R (3105 sym/22 pcs) 46 img

Computing AIC on a Validation Sample

29.07.2015

This afternoon, we’ve seen in the training on data science that it was possible to use AIC criteria for model selection. > library(splines) > AIC(glm(dist ~ speed, data=train_cars, family=poisson(link="log"))) [1] 438.6314 > AIC(glm(dist ~ speed, data=train_cars, family=poisson(link="identity"))) [1] 436.3997 > AIC(glm(dist ~ bs(speed), d...

2311 sym R (3196 sym/13 pcs) 6 img

Pricing Game

22.08.2015

In November, with Romuald Elie and Jérémie Jakubowicz, we will organize a session during the 100% Actuaires day, in Paris, based on a “pricing game“. We provide two datasets, (motor insurance, third party claims), with 2  years of experience, and 100,000 policies. Each ‘team’ has to submit premium proposal for 36,000 potential...

1125 sym R (210 sym/1 pcs)

“A 99% TVaR is generally a 99.6% VaR”

29.08.2015

Almost 6 years ago, I posted a brief comment on a sentence I found surprising, by that time, discovered in a report claiming that the expected shortfall […] at the 99 % level corresponds quite closely to the […] value-at-risk at a 99.6% level which was inspired by a remark in Swiss Experience report, expected shortfall […] on a 99% c...

1788 sym R (937 sym/7 pcs) 12 img

On NCDF Climate Datasets

03.09.2015

Mid november, a nice workshop on big data and environment will be organized, in Argentina, We will talk a lot about climate models, and I wanted to play a little bit with those data, stored on http://dods.ipsl.jussieu.fr/mc2ipsl/. Since Ewen (aka @3wen) has been working on those datasets recently, he kindly told me how to read those dataset...

2720 sym R (2451 sym/17 pcs) 6 img

Minimalist Maps

05.09.2015

This week, I mentioned a series of maps, on Twitter, some minimalist maps http://t.co/YCNPf3AR9n (poke @visionscarto) pic.twitter.com/Ip9Tylsbkv — Arthur Charpentier (@freakonometrics) 2 Septembre 2015 Friday evening, just before leaving the office to pick-up the kids after their first week back in class, Matthew Champion (aka @matthewchampion...

1854 sym R (773 sym/7 pcs) 14 img

Convergence and Asymptotic Results

24.09.2015

Last week, in our mathematical statistics course, we’ve seen the law of large numbers (that was proven in the probability course), claiming that given a collection  of i.i.d. random variables, with To visualize that convergence, we can use > m=100 > mean_samples=function(n=10){ + X=matrix(rnorm(n*m),nrow=m,ncol=n) + return(apply(X,1...

1546 sym R (1398 sym/6 pcs) 24 img