Publications by arthur charpentier
Combining automatically factor levels with trees
Last year, in a post, I discussed how to merge levels of factor variables, using combinatorial techniques (it was for my STT5100 cours, and trees are not in the syllabus), with an extension on trees at the end of the post. consider the following (simulated dataset) n=200 set.seed(1) x1=runif(n) x2=runif(n) y=1+2*x1-x2+rnorm(n,0,.2) LB=sample...
1628 sym R (1290 sym/6 pcs) 4 img
On the conjugate function
In the MAT7381 course (graduate course on regression models), we will talk about optimization, and a classical tool is the so-called conjugate. Given a function \(f:\mathbb{R}^p\to\mathbb{R}\) its conjugate is function \(f^{\star}:\mathbb{R}^p\to\mathbb{R}\) such that \(f^{\star}(\boldsymbol{y})=\max_{\boldsymbol{x}}\lbrace\boldsymbol{x}^\top\bol...
4027 sym 10 img
On Cochran Theorem (and Orthogonal Projections)
Cochran Theorem – from The distribution of quadratic forms in a normal system, with applications to the analysis of covariance published in 1934 – is probably the most import one in a regression course. It is an application of a nice result on quadratic forms of Gaussian vectors. More precisely, we can prove that if \(\boldsymbol{Y}\sim\mathc...
6457 sym 6 img
Quantile Regression (home made, part 2)
A few months ago, I posted a note with some home made codes for quantile regression… there was something odd on the output, but it was because there was a (small) mathematical problem in my equation. So since I should teach those tomorrow, let me fix them. Median Consider a sample \(\{y_1,\cdots,y_n\}\). To compute the median, solve\(\min_\mu \...
3771 sym R (1883 sym/10 pcs) 2 img
Lasso Regression (home made)
To compute Lasso regression, \(\frac{1}{2}\|\mathbf{y}-\mathbf{X}\mathbf{\beta}\|_{\ell_2}^2+\lambda\|\mathbf{\beta}\|_{\ell_1}\)define the soft-thresholding function\(S(z,\gamma)=\text{sign}(z)\cdot(|z|-\gamma)_+=\begin{cases}z-\gamma&\text{ if }\gamma>|z|\text{ and }z<0\\z+\gamma&\text{ if }\gamma soft_thresholding = function(x,a){ sign(x) * p...
1997 sym R (1543 sym/6 pcs) 2 img
Testing for a causal effect (with 2 time series)
A few days ago, I came back on a sentence I found (in a French newspaper), where someone was claiming that “… an old variable explains 85% of the change in a new variable. So we can talk about causality” and I tried to explain that it was just stupid : if we consider the regression of the temperature on day \(t+1\) against the number of cyc...
4809 sym R (2700 sym/13 pcs) 4 img
Function basis and regression
In the first part of the course on linear models, we’ve seen how to construct a linear model when the vector of covariates \(\boldsymbol{x}\) is given, so that \(\mathbb{E}(Y|\boldsymbol{X}=\boldsymbol{x})\) is either simply \(\boldsymbol{x}^\top\boldsymbol{\beta}\) (for standard linear models) or a functional of \(\boldsymbol{x}^\top\boldsymbo...
7077 sym R (4878 sym/12 pcs) 16 img
Modeling pandemics (1)
The most popular model to model epidemics is the so-called SIR model – or Kermack-McKendrick. Consider a population of size \(N\), and assume that \(S\) is the number of susceptible, \(I\) the number of infectious, and \(R\) for the number recovered (or immune) individuals, \(\displaystyle {\begin{aligned}&{\frac {dS}{dt}}=-{\frac {\beta IS}{N}...
2727 sym R (1514 sym/9 pcs) 6 img
Modeling pandemics (2)
When introducing the SIR model, in our initial post, we got an ordinary differential equation, but we did not really discuss stability, and periodicity. It has to do with the Jacobian matrix of the system. But first of all, we had three equations for three function, but actually\(\displaystyle{{\frac{dS}{dt}}+{\frac {dI}{dt}}+{\frac {dR}{dt}}=0}\...
2381 sym R (1013 sym/7 pcs) 6 img
Modeling Pandemics (3)
In Statistical Inference in a Stochastic Epidemic SEIR Model with Control Intervention, a more complex model than the one we’ve seen yesterday was considered (and is called the SEIR model). Consider a population of size \(N\), and assume that \(S\) is the number of susceptible, \(E\) the number of exposed, \(I\) the number of infectious, and \(...
3536 sym R (1186 sym/4 pcs) 4 img