Publications by insightr

LASSO, adaLASSO and the GLMNET package

06.04.2017

By Gabriel Vasconcelos Motivation If you are close to the data science world you probably heard about LASSO. It stands for Least Absolute Shrinkage and Selection Operator. The LASSO is a model that uses a penalization on the size of the parameters in the objective function to try to exclude irrelevant variables from the model. It has two very na...

3571 sym R (987 sym/5 pcs) 12 img

Introducing the ArCo package

06.04.2017

By Gabriel Vasconcelos What is the ArCo?? We recently launched the R package ArCo. It is an implementation of the Artificial Counterfactual method proposed by Carvalho, Masini and Medeiros (2016). This post will review some of its features and show how simple it is to estimate “what would have happened” if something “had not happened”. C...

5393 sym R (1023 sym/4 pcs) 8 img

A little R prank

11.04.2017

By Gabriel Vasconcelos R functions The book Advanced R, by Hadley Wickham, shows a very interesting statement: “To understand R, two slogans are Helpful: Everything that exists in an object. Everything that happens is a function call.” – John Chambers In other words, every action is caused by a function, and functions are things you can...

1111 sym R (164 sym/2 pcs) 4 img

American Bond Yields and Principal Component Analysis

13.04.2017

By Yuri Fonseca The idea of this post is to give an empirical example of how Principal Component Analysis (PCA) can be applied in Finance, especially in the Fixed Income Market. Principal components are very useful to reduce data dimensionality and give a joint interpretation to a group of variables. For example, one could use it to try to extra...

5460 sym R (4851 sym/8 pcs) 54 img

Realy, Realy Big VARs

27.04.2017

By Gabriel Vasconcelos Overview If you have studied Vector Autorregressive (VAR) models you are probably familiar with the “curse of dimensionality” (CD). It is very frustrating to see how ordinary least squares (OLS) fails to produce reliable results even for moderate size VARs. For those who are new to VARs, the CD means that the number of p...

3366 sym R (1488 sym/4 pcs) 24 img

Problems of causal inference after selecting of controls

12.05.2017

By Gabriel Vasconcelos Inference after model selection In many cases, when we want to estimate some causal relationship between two variables we have to solve the problem of selecting the right control variables. If we fail, our results will be very fragile and the estimator potentially biased because we left some important control variables out....

4155 sym R (2491 sym/2 pcs) 56 img

Bagging, the perfect solution for model instability

22.05.2017

By Gabriel Vasconcelos Motivation The name bagging comes from boostrap aggregating. It is a machine learning technique proposed by Breiman (1996) to increase stability in potentially unstable estimators. For example, suppose you want to run a regression with a few variables in two steps. First, you run the regression with all the variables in you...

3734 sym R (1796 sym/4 pcs) 40 img

Complete Subset Regressions, simple and powerful

31.05.2017

By Gabriel Vasconcelos The complete subset regressions (CSR) is a forecasting method proposed by Elliott, Gargano and Timmermann in 2013. It is as very simple but powerful technique. Suppose you have a set of variables and you want to forecast one of them using information from the others. If your variables are highly correlated and the variable...

3264 sym R (1580 sym/3 pcs) 28 img

Non gaussian time-series, let’s handle it with score driven models!

07.06.2017

By Henrique Helfer Motivation Until very recently, only a very limited classes of feasible non Gaussian time series models were available. For example, one could use extensions of state space models to non Gaussian environments (see, for example, Durbin and Koopman (2012)), but extensive Monte Carlo simulation is required to numerically evaluate ...

7212 sym Python (3979 sym/8 pcs) 103 img

When the LASSO fails???

14.06.2017

By Gabriel Vasconcelos When the LASSO fails? The LASSO has two important uses, the first is forecasting and the second is variable selection. We are going to talk about the second. The variable selection objective is to recover the correct set of variables that generate the data or at least the best approximation given the candidate variables. T...

5449 sym R (1999 sym/3 pcs) 46 img