Publications by insightr
How Random Forests improve simple Regression Trees?
By Gabriel Vasconcelos Regression Trees In this post I am going to discuss some features of Regression Trees an Random Forests. Regression Trees are know to be very unstable, in other words, a small change in your data may drastically change your model. The Random Forest uses this instability as an advantage through bagging (you can see details ...
4614 sym R (1627 sym/6 pcs) 14 img
Uber assignment with lpSolve
By Yuri Fonseca In this post we are going to make an Uber assignment simulation and calculate some metrics of waiting time through simulation. Setting Suppose we live in a 100×100 block city where each block takes 1 minute to cross by car. Drivers can pick up passengers only on corners, and passengers must call Uber on corners. Inferior-left co...
2571 sym R (3385 sym/4 pcs) 8 img
Writing Julia functions in R with examples
By Gabriel Vasconcelos The Julia programming language is growing fast and its efficiency and speed is now well-known. Even-though I think R is the best language for Data Science, sometimes we just need more. Modelling is an important part of Data Science and sometimes you may need to implement your own algorithms or adapt existing models to your...
4351 sym R (5221 sym/6 pcs) 4 img
Formal ways to compare forecasting models: Rolling windows
By Gabriel Vasconcelos Overview When working with time-series forecasting we often have to choose between a few potential models and the best way is to test each model in pseudo-out-of-sample estimations. In other words, we simulate a forecasting situation where we drop some data from the estimation sample to see how each model perform. Natural...
4368 sym R (3086 sym/5 pcs) 16 img
A crazy day in the Bitcoin World
By Gabriel Vasconcelos Today, November 29, 2017 was a crazy day in the Bitcoin world and the craziness is still going on as I write this post. The price range was of thousands of Dollars in a few hours. Bitcoins were today the main topic in all discussion groups I participate. Some people believe we are in the middle of a giant bubble and are ve...
2611 sym R (3473 sym/1 pcs) 8 img 1 tbl
Using the tuber package to analyse a YouTube channel
By Gabriel Vasconcelos So I decided to have a quick look at the tuber package to extract YouTube data in R. My cousin is a singer (a hell of a good one) and he has a YouTube channel (dan vasc), which I strongly recommend, where he posts his covers. I will focus my analysis on his channel. The tuber package is very friendly and it downloads YouTu...
3353 sym R (4663 sym/1 pcs) 12 img 2 tbl
Direct forecast X Recursive forecast
By Gabriel Vasconcelos When dealing with forecasting models there is an issue that generates a lot of confusion, which is the difference between direct and recursive forecasts. I believe most people are more used to recursive forecasts because they are the first we learn when studying ARIMA models. Suppose you want to forecast the variable , st...
5313 sym R (2513 sym/6 pcs) 80 img
Parametric Portfolio Policies
By Gabriel Vasconcelos Overview There are several ways to do portfolio optimization out there, each with its advantages and disadvantages. We already discussed some techniques here. Today I am going to show another method to perform portfolio optimization that works very well in large datasets because it produces very robust weights, which resul...
7776 sym R (6143 sym/10 pcs) 58 img
Tuning xgboost in R: Part I
By Gabriel Vasconcelos Before we begin, I would like to thank Anuj for kindly including our blog in his list of the top40 R blogs! Check out the full list at his page, FeedSpot! Introduction Tuning a Boosting algorithm for the first time may be a very confusing task. There are so many parameters to choose and they all have different behaviour on...
8003 sym R (3785 sym/9 pcs) 22 img
Different demand functions and optimal price estimation in R
By Yuri Fonseca Demand models In the previous post about pricing optimization (link here), we discussed a little about linear demand and how to estimate optimal prices in that case. In this post we are going to compare three different types of demand models for homogeneous products and how to find optimal prices for each one of them. For the li...
3940 sym R (4105 sym/7 pcs) 99 img