Publications by arthur charpentier
Game of Friendship Paradox
In the introduction of my course next week, I will (briefly) mention networks, and I wanted to provide some illustration of the Friendship Paradox. On network of thrones (discussed in Beveridge and Shan (2016)), there is a dataset with the network of characters in Game of Thrones. The word “friend” might be abusive here, but let’s continue ...
2731 sym R (844 sym/6 pcs) 4 img
Convex Regression Model
This morning during the lecture on nonlinear regression, I mentioned (very) briefly the case of convex regression. Since I forgot to mention the codes in R, I will publish them here. Assume that \(y_i=m(\mathbf{x}_i)+\varepsilon_i\) where \(m:\mathbb{R}^d\rightarrow \mathbb{R}\) is some convex function. Then \(m\) is convex if and only if \(\for...
3370 sym R (516 sym/5 pcs) 4 img
Combining automatically factor levels in R
Each time we face real applications in an applied econometrics course, we have to deal with categorial variables. And the same question arise, from students : how can we combine automatically factor levels ? Is there a simple R function ? I did upload a few blog posts, over the pas years. But so far, nothing satistfying. Let me write down a few l...
3597 sym R (5695 sym/14 pcs) 12 img
October, grant proposal season
In 2012, Danielle Herbert, Adrian Barnett, Philip Clarke and Nicholas Graves published an article entitled “on the time spent preparing grant proposals: an observational study of Australian researchers“, whose conclusions had been included in Nature under a more explicit title, “Australia’s grant system wastes time” ! In this study, the...
2969 sym R (1179 sym/8 pcs) 12 img
Monte Carlo techniques to create counterfactuals
In the previous STT5100 course, last week, we’ve seen how to use monte carlo simulations. The idea is that we do observe in statistics a sample \(\{y_1,\cdots,y_n\}\), and more generally, in econometrics \(\{(y_1,\mathbf{x}_1),\cdots,(y_n,\mathbf{x}_n)\}\). But let’s get back to statistics (without covariates) to illustrate. We assume that ob...
4885 sym R (1252 sym/6 pcs) 10 img
Solving the chinese postman problem
Some pre-Halloween post today. It started actually while I was in Barcelona : kids wanted to go back to some store we’ve seen the first day, in the gothic part, and I could not remember where it was. And I said to myself that would be quite long to do all the street of the neighborhood. And I discovered that it was actually an old problem. In 1...
3170 sym R (3458 sym/11 pcs) 18 img
The “probability to win” is hard to estimate…
Real-time computation (or estimation) of the “probability to win” is difficult. We’ve seem that in soccer games, in elections… but actually, as a professor, I see that frequently when I grade my students. Consider a classical multiple choice exam. After each question, imagine that you try to compute the probability that the student will p...
2911 sym R (606 sym/5 pcs) 8 img
NSERC – Discovery Grants Program, over the past 5 years
In a previous post, I discussed how it was possible to scrap the NSERC website to get stats about discovery grants. Since we just got the new 2018 figures, I thought it would be a good opportunity to update my graphs, library(XML) library(stringr) url="http://www.nserc-crsng.gc.ca/NSERC-CRSNG/FundingDecisions-DecisionsFinancement/ResearchGrants...
1344 sym R (1458 sym/1 pcs) 12 img
Random thoughts on econometric models with (pure) random features
For my lectures on applied linear models, I wanted to illustrate the fact that the \(R^2\) is never a good measure of the goodness of the model, since it’s quite easy to improve it. Consider the following dataset n=100 df=data.frame(matrix(rnorm(n*n),n,n)) names(df)=c("Y",paste("X",1:99,sep="")) with one variable of interest \(y\), and 99 fea...
4032 sym R (1005 sym/8 pcs) 20 img
On the poor performance of classifiers in insurance models
Each time we have a case study in my actuarial courses (with real data), students are surprised to have hard time getting a “good” model, and they are always surprised to have a low AUC, when trying to model the probability to claim a loss, to die, to fraud, etc. And each time, I keep saying, “yes, I know, and that’s what we expect becaus...
4868 sym R (1416 sym/4 pcs) 4 img