Publications by arthur charpentier
Studying joint effects in a regression
We’ve seen in the previous post (here) how important the *-cartesian product to model joint effected in the regression. Consider the case of two explanatory variates, one continuous (, the age of the driver) and one qualitative (, gasoline versus diesel). The additive model Assume here thatThen, given (the exposure, assumed to be constant)...
2817 sym 32 img
Margin of error, and comparing proportions in the same sample
I recently tried to answer a simple question, asked by @adelaigue. Actually, I thought that the answer would be obvious… but it is a little bit more compexe than what I thought. In a recent pool about elections in Brazil, it was mentionned in a French newspapper that “Mme Rousseff, 62 ans, de 46,8% des intentions de vote et José Serra, 68 a...
4122 sym 68 img
A million ? what are the odds…
50 days ago, I published a post, here, on forecasting techniques. I was wondering what could be the probability to have, by the end of this year, one million pages viewed (from Google Analytics) on this blog. Well, initially, it was on my blog at the Université de Rennes 1 (http://blogperso.univ-rennes1.fr/arthur.charpentier/), but since I trans...
3474 sym 12 img
Names of villages, in France
Keith Briggs published a post here on names of English place name element distribution, which contains almost twenty maps like the one where names ends by -bourn,bourne,burn (here) or -head (there). Actually, it is possible (Robin mentioned that already here) to do similar things in France… Consider the dataset containing the 35,250 commune nam...
3880 sym 38 img 9 tbl
Comments on probabilities
The only thing I remember from courses I had in probability a few years ago is that we also have to clearly defined the event we want to calculate the probability. On the Freakonomics blog, last week, the Israeli lottery was mentioned (here, see also there where I mentioned that, and odds facts from the French lottery), Yesterday, Andrew Gelman ...
3989 sym 18 img
My residuals look weird… aren’t they ?
Since I got the same question twice, let us look at it quickly…. Some students show me a graph (from a Poisson regression) which looks like that, and they asked “isn’t it weird ?“, i.e.”residuals are null or positive… this is not what we should have, right ?“. Actually, residuals are always centered in a glm regression (if you ke...
3192 sym 6 img
Splines: opening the (black) box…
Splines in regression is something which looks like a black box (or maybe like some dishes you get when you travel away from home: it tastes good, but you don’t what’s inside… even if you might have some clues, you never know for sure*). With splines, it is the same: there are knots, then we consider polynomial interpolation...
7893 sym 42 img
Pretty R code in the blog
David Smith (alias @revodavid, see also on the Revolutions blog, here) pointed out that my R code was not easy to read (not only due to my computing skills, but mainly because of the typography I use). He suggested that I use the Pretty R tool (here). And I will… So, just to answer quickly to a question I received by email (a few weeks ago, sor...
884 sym R (531 sym/1 pcs) 2 img
Updating meteorological forecasts, part 1
As Mark Twain said “the art of prophecy is very difficult, especially about the future” (well, actually I am not sure Mark Twain was the first one to say so, but if you’re interested by that sentence, you can look here). I have been rather surprised to see how Canadians can be interested in weather, and weather forecasts (see e.g. here fo...
5111 sym R (894 sym/2 pcs) 72 img
Generating a quasi Poisson distribution, version 2
Here and there, I mentioned two codes to generated quasiPoisson random variables. And in both of them, the negative binomial approximation seems to be wrong. Recall that the negative binomial distribution is where and in R, a negative binomial distribution can be parametrized using two parameters, out of the following ones the size, the proba...
1241 sym R (578 sym/3 pcs) 24 img