Publications by insightr
LASSO, adaLASSO and the GLMNET package
By Gabriel Vasconcelos Motivation If you are close to the data science world you probably heard about LASSO. It stands for Least Absolute Shrinkage and Selection Operator. The LASSO is a model that uses a penalization on the size of the parameters in the objective function to try to exclude irrelevant variables from the model. It has two very na...
3571 sym R (987 sym/5 pcs) 12 img
Introducing the ArCo package
By Gabriel Vasconcelos What is the ArCo?? We recently launched the R package ArCo. It is an implementation of the Artificial Counterfactual method proposed by Carvalho, Masini and Medeiros (2016). This post will review some of its features and show how simple it is to estimate “what would have happened” if something “had not happened”. C...
5393 sym R (1023 sym/4 pcs) 8 img
A little R prank
By Gabriel Vasconcelos R functions The book Advanced R, by Hadley Wickham, shows a very interesting statement: “To understand R, two slogans are Helpful: Everything that exists in an object. Everything that happens is a function call.” – John Chambers In other words, every action is caused by a function, and functions are things you can...
1111 sym R (164 sym/2 pcs) 4 img
American Bond Yields and Principal Component Analysis
By Yuri Fonseca The idea of this post is to give an empirical example of how Principal Component Analysis (PCA) can be applied in Finance, especially in the Fixed Income Market. Principal components are very useful to reduce data dimensionality and give a joint interpretation to a group of variables. For example, one could use it to try to extra...
5460 sym R (4851 sym/8 pcs) 54 img
Realy, Realy Big VARs
By Gabriel Vasconcelos Overview If you have studied Vector Autorregressive (VAR) models you are probably familiar with the “curse of dimensionality” (CD). It is very frustrating to see how ordinary least squares (OLS) fails to produce reliable results even for moderate size VARs. For those who are new to VARs, the CD means that the number of p...
3366 sym R (1488 sym/4 pcs) 24 img
Problems of causal inference after selecting of controls
By Gabriel Vasconcelos Inference after model selection In many cases, when we want to estimate some causal relationship between two variables we have to solve the problem of selecting the right control variables. If we fail, our results will be very fragile and the estimator potentially biased because we left some important control variables out....
4155 sym R (2491 sym/2 pcs) 56 img
Bagging, the perfect solution for model instability
By Gabriel Vasconcelos Motivation The name bagging comes from boostrap aggregating. It is a machine learning technique proposed by Breiman (1996) to increase stability in potentially unstable estimators. For example, suppose you want to run a regression with a few variables in two steps. First, you run the regression with all the variables in you...
3734 sym R (1796 sym/4 pcs) 40 img
Complete Subset Regressions, simple and powerful
By Gabriel Vasconcelos The complete subset regressions (CSR) is a forecasting method proposed by Elliott, Gargano and Timmermann in 2013. It is as very simple but powerful technique. Suppose you have a set of variables and you want to forecast one of them using information from the others. If your variables are highly correlated and the variable...
3264 sym R (1580 sym/3 pcs) 28 img
Non gaussian time-series, let’s handle it with score driven models!
By Henrique Helfer Motivation Until very recently, only a very limited classes of feasible non Gaussian time series models were available. For example, one could use extensions of state space models to non Gaussian environments (see, for example, Durbin and Koopman (2012)), but extensive Monte Carlo simulation is required to numerically evaluate ...
7212 sym Python (3979 sym/8 pcs) 103 img
When the LASSO fails???
By Gabriel Vasconcelos When the LASSO fails? The LASSO has two important uses, the first is forecasting and the second is variable selection. We are going to talk about the second. The variable selection objective is to recover the correct set of variables that generate the data or at least the best approximation given the candidate variables. T...
5449 sym R (1999 sym/3 pcs) 46 img