Publications by arthur charpentier

Game of Friendship Paradox

27.06.2018

In the introduction of my course next week, I will (briefly) mention networks, and I wanted to provide some illustration of the Friendship Paradox. On network of thrones (discussed in Beveridge and Shan (2016)), there is a dataset with the network of characters in Game of Thrones. The word “friend” might be abusive here, but let’s continue ...

2731 sym R (844 sym/6 pcs) 4 img

Convex Regression Model

05.07.2018

This morning during the lecture on nonlinear regression, I mentioned (very) briefly the case of convex regression. Since I forgot to mention the codes in R, I will publish them here. Assume that \(y_i=m(\mathbf{x}_i)+\varepsilon_i\) where \(m:\mathbb{R}^d\rightarrow \mathbb{R}\) is some convex function. Then \(m\) is convex if and only if \(\for...

3370 sym R (516 sym/5 pcs) 4 img

Combining automatically factor levels in R

06.10.2018

Each time we face real applications in an applied econometrics course, we have to deal with categorial variables. And the same question arise, from students : how can we combine automatically factor levels ? Is there a simple R function ? I did upload a few blog posts, over the pas years. But so far, nothing satistfying. Let me write down a few l...

3597 sym R (5695 sym/14 pcs) 12 img

October, grant proposal season

09.10.2018

In 2012, Danielle Herbert, Adrian Barnett, Philip Clarke and Nicholas Graves published an article entitled “on the time spent preparing grant proposals: an observational study of Australian researchers“, whose conclusions had been included in Nature under a more explicit title, “Australia’s grant system wastes time” ! In this study, the...

2969 sym R (1179 sym/8 pcs) 12 img

Monte Carlo techniques to create counterfactuals

11.10.2018

In the previous STT5100 course, last week, we’ve seen how to use monte carlo simulations. The idea is that we do observe in statistics a sample \(\{y_1,\cdots,y_n\}\), and more generally, in econometrics \(\{(y_1,\mathbf{x}_1),\cdots,(y_n,\mathbf{x}_n)\}\). But let’s get back to statistics (without covariates) to illustrate. We assume that ob...

4885 sym R (1252 sym/6 pcs) 10 img

Solving the chinese postman problem

19.10.2018

Some pre-Halloween post today. It started actually while I was in Barcelona : kids wanted to go back to some store we’ve seen the first day, in the gothic part, and I could not remember where it was. And I said to myself that would be quite long to do all the street of the neighborhood. And I discovered that it was actually an old problem. In 1...

3170 sym R (3458 sym/11 pcs) 18 img

The “probability to win” is hard to estimate…

06.11.2018

Real-time computation (or estimation) of the “probability to win” is difficult. We’ve seem that in soccer games, in elections… but actually, as a professor, I see that frequently when I grade my students. Consider a classical multiple choice exam. After each question, imagine that you try to compute the probability that the student will p...

2911 sym R (606 sym/5 pcs) 8 img

NSERC – Discovery Grants Program, over the past 5 years

07.02.2019

In a previous post, I discussed how it was possible to scrap the NSERC website to get stats about discovery grants. Since we just got the new 2018 figures, I thought it would be a good opportunity to update my graphs, library(XML) library(stringr) url="http://www.nserc-crsng.gc.ca/NSERC-CRSNG/FundingDecisions-DecisionsFinancement/ResearchGrants...

1344 sym R (1458 sym/1 pcs) 12 img

Random thoughts on econometric models with (pure) random features

16.02.2019

For my lectures on applied linear models, I wanted to illustrate the fact that the \(R^2\) is never a good measure of the goodness of the model, since it’s quite easy to improve it. Consider the following dataset n=100 df=data.frame(matrix(rnorm(n*n),n,n)) names(df)=c("Y",paste("X",1:99,sep="")) with one variable of interest \(y\), and 99 fea...

4032 sym R (1005 sym/8 pcs) 20 img

On the poor performance of classifiers in insurance models

13.03.2019

Each time we have a case study in my actuarial courses (with real data), students are surprised to have hard time getting a “good” model, and they are always surprised to have a low AUC, when trying to model the probability to claim a loss, to die, to fraud, etc. And each time, I keep saying, “yes, I know, and that’s what we expect becaus...

4868 sym R (1416 sym/4 pcs) 4 img