Publications by insightr

Tuning xgboost in R: Part II

28.07.2018

By Gabriel Vasconcelos In this previous post I discussed some of the parameters we have to tune to estimate a boosting model using the xgboost package. In this post I will discuss the two parameters that were left out in part I, which are the gamma and the min_child_weight. These two parameters are much less obvious to understand but they can si...

2973 sym R (2233 sym/5 pcs) 14 img

Introducing the HCmodelSets Package

04.08.2018

By Henrique Helfer Hoeltgebaum Introduction I am happy to introduce the package HCmodelSets, which is now available on CRAN. This package implements the methods proposed by Cox, D.R. and Battey, H.S. (2017). In particular it performs the reduction, exploratory and model selection phases given in the aforementioned reference. The software support...

5511 sym R (1563 sym/5 pcs) 6 img

BooST (Boosting Smooth Trees) a new Machine Learning Model for Partial Effect Estimation in Nonlinear Regressions

14.08.2018

By Gabriel Vasconcelos and Yuri Fonseca We are happy to introduce our new machine learning method called Boosting Smooth Trees (BooST) (full article here). This model was a joint work with professors Marcelo Medeiros and Álvaro Veiga. The BooST uses a different type of regression tree that allows us to estimate the derivatives of very general n...

3897 sym R (54 sym/1 pcs) 16 img

BooST series I: Advantage in Smooth Functions

20.08.2018

By Gabriel Vasconcelos and Yuri Fonseca Introduction This is the first of a series of post on the BooST (Boosting Smooth Trees). If you missed the first post introducing the model click here and if you want to see the full article click here. The BooST is a model that uses Smooth Trees as base learners, which makes it possible to approximate the...

5294 sym R (2872 sym/13 pcs) 32 img

Growing Objects and Loop Memory Pre-Allocation

23.08.2018

By Thiago Milagres Preallocating Memory This will be a short post about a simple, but very important concept that can drastically increase the speed of poorly written codes. It is very common to see R loops written as follows: v = NULL n = 1e5 for(i in 1:n) v = c(v, i) This seems like a natural way to write such a task: at each iteration, we in...

2575 sym R (3556 sym/8 pcs) 2 img

BooST series II: Pricing Optimization

01.10.2018

By Gabriel Vasconcelos & Yuri Fonseca Introduction This post is the second of a series of examples of the BooST (Boosting Smooth Trees) model. You can see an introduction to the model here and the first example here. Our objective in this post is to use the derivatives of the BooST to obtain prices that maximize the profit for a given set of pro...

5217 sym R (3213 sym/9 pcs) 44 img

Benford’s Law for Fraud Detection with an Application to all Brazilian Presidential Elections from 2002 to 2018

17.11.2018

By Gabriel Vasconcelos and Yuri Fonseca The intuition Let us begin with a brief explanation about Benford’s law and why should it work as a fraud detector method. Given a set of numbers, the first thing we need to do is to extract the first digit of each number. For example, for (121,245,12,55) the first digits will be (1,2,1,5). Perhaps our i...

17900 sym R (11822 sym/33 pcs) 60 img

Structural Analisys of Bayesian VARs with an example using the Brazilian Development Bank

05.01.2019

By Gabriel Vasconcelos Introduction Vector Autorregresive (VAR) models are very popular in economics because they can model a system of economic variables and relations. Bayesian VARs are receiving a lot of attention due to their ability to deal with larger systems and the smart use of priors. For example, in this old post I showed an example of...

6250 sym R (1599 sym/6 pcs) 42 img

Basic Quantile Regression

12.08.2019

By Gabriel Vasconcelos Introduction Today we are going to talk about quantile regression. When we use the lm command in R we are fitting a linear regression using Ordinary Least Squares (OLS), which has the interpretation of a model for the conditional mean of on . However, sometimes we may need to look at more than the conditional mean to unde...

4074 sym R (12377 sym/6 pcs) 30 img