Publications by arthur charpentier

Studying joint effects in a regression

07.10.2010

We’ve seen in the previous post (here)  how important the *-cartesian product to model joint effected in the regression. Consider the case of two explanatory variates, one continuous (, the age of the driver) and one qualitative (, gasoline versus diesel). The additive model Assume here thatThen, given  (the exposure, assumed to be constant)...

2817 sym 32 img

Margin of error, and comparing proportions in the same sample

15.10.2010

I recently tried to answer a simple question, asked by @adelaigue. Actually, I thought that the answer would be obvious… but it is a little bit more compexe than what I thought. In a recent pool about elections in Brazil, it was mentionned in a French newspapper that “Mme Rousseff, 62 ans, de 46,8% des intentions de vote et José Serra, 68 a...

4122 sym 68 img

A million ? what are the odds…

27.10.2010

50 days ago, I published a post, here, on forecasting techniques. I was wondering what could be the probability to have, by the end of this year, one million pages viewed (from Google Analytics) on this blog. Well, initially, it was on my blog at the Université de Rennes 1 (http://blogperso.univ-rennes1.fr/arthur.charpentier/), but since I trans...

3474 sym 12 img

Names of villages, in France

02.11.2010

Keith Briggs published a post here on names of English place name element distribution, which contains almost twenty maps like the one where names ends by -bourn,bourne,burn (here) or -head (there). Actually, it is possible (Robin mentioned that already here) to do similar things in France… Consider the dataset containing the 35,250 commune nam...

3880 sym 38 img 9 tbl

Comments on probabilities

02.11.2010

The only thing I remember from courses I had in probability a few years ago is that we also have to clearly defined the event we want to calculate the probability. On the Freakonomics blog, last week, the Israeli lottery was mentioned (here, see also there where I mentioned that, and odds facts from the French lottery), Yesterday, Andrew Gelman ...

3989 sym 18 img

My residuals look weird… aren’t they ?

03.11.2010

Since I got the same question twice, let us look at it quickly….  Some students show me a graph (from a Poisson regression) which looks like that, and they asked “isn’t it weird ?“, i.e.”residuals are null or positive… this is not what we should have, right ?“. Actually, residuals are always centered in a glm regression (if you ke...

3192 sym 6 img

Splines: opening the (black) box…

04.11.2010

Splines in regression is something which looks like a black box (or maybe like some dishes you get when you travel away from home: it tastes good, but you don’t what’s inside… even if you might have some clues, you never know for sure*). With splines, it is the same: there are knots, then we consider polynomial interpolation...

7893 sym 42 img

Pretty R code in the blog

05.11.2010

David Smith (alias @revodavid, see also on the Revolutions blog, here) pointed out that my R code was not easy to read (not only due to my computing skills, but mainly because of the typography I use). He suggested that I use the Pretty R tool (here). And I will… So, just to answer quickly to a question I received by email (a few weeks ago, sor...

884 sym R (531 sym/1 pcs) 2 img

Updating meteorological forecasts, part 1

07.11.2010

As Mark Twain said “the art of prophecy is very difficult, especially about the future” (well, actually I am not sure Mark Twain was the  first one to say so, but if you’re interested by that sentence, you can look here). I have been rather surprised to see how Canadians can be interested in weather, and weather forecasts (see e.g. here fo...

5111 sym R (894 sym/2 pcs) 72 img

Generating a quasi Poisson distribution, version 2

10.11.2010

Here and there, I mentioned two codes to generated quasiPoisson random variables. And in both of them, the negative binomial approximation seems to be wrong. Recall that the negative binomial distribution is where and in R, a negative binomial distribution can be parametrized using two parameters, out of the following ones the size,  the proba...

1241 sym R (578 sym/3 pcs) 24 img