Publications by Vik Paruchuri

Using R in Ruby

10.01.2012

Integrating R into more traditional programming languages can be incredibly rewarding due to R’s powerful built-in statistical tools, but it can also be extremely frustrating at times. Thankfully, like much else to do with Ruby, integrating R and Ruby is quite a simple process. To begin, install the gem rinruby and require it in y...

2396 sym R (306 sym/8 pcs)

Introduction to Kaggle Algorithmic Trading Challenge

10.01.2012

I recently participated in the Kaggle Algorithmic Trading Competition under the username VikP. For those who do not know what Kaggle is, it is a web site where individuals and corporations can host data analysis competitions. This particular competition involved the prediction of how the prices of 50,000 observations of 102 differen...

2088 sym

Time Series Cointegration in R

10.01.2012

Cointegration can be a valuable tool in determining the mean reverting properties of 2 time series. A full description of cointegration can be found on Wikipedia. Essentially, it seeks to find stationary linear combinations of the two vectors. The below R code, which has been modified from here, will test two series for integration ...

1709 sym R (195 sym/1 pcs)

Parallel R Loops in Windows and Linux

17.01.2012

Parallel computation may seem difficult to implement and a pain to use, but it is actually quite simple to use. The foreach package provides the basic loop structure, which can utilize various parallel backends to execute the loop in parallel. First, let’s go over the basic structure of a foreach loop. To get the foreach package, run the follow...

2979 sym R (642 sym/11 pcs)

Parallel R Loops for Windows and Linux

17.01.2012

Parallel computation may seem difficult to implement and a pain to use, but it is actually quite simple to use. The foreach package provides the basic loop structure, which can utilize various parallel backends to execute the loop in parallel. First, let’s go over the basic structure of a foreach loop. To get the foreach package,...

3022 sym R (644 sym/11 pcs)

Time Based Arbitrage Opportunities in Tick Data

17.01.2012

I recently posted an introduction to the Kaggle Algorithmic Trading Challenge, which I competed in.I said that I would post about my experiences, and this is hopefully the first of a series. We were given tick data from the London Stock Exchange(specifically, the FTSE 100) over random time intervals during parts of 37 days. Each dat...

4159 sym R (169 sym/1 pcs) 8 img

Improve Predictive Performance in R with Bagging

18.01.2012

Bagging, aka bootstrap aggregation, is a relatively simple way to increase the power of a predictive statistical model by taking multiple random samples(with replacement) from your training data set, and using each of these samples to construct a separate model and separate predictions for your test set. These predictions are then av...

5102 sym R (1310 sym/6 pcs)

Intro to Ensemble Learning in R

19.01.2012

Introduction This post incorporates parts of yesterday’s post about bagging. If you are unfamiliar with bagging, I suggest that you read it before continuing with this article. I would like to give a basic overview of ensemble learning. Ensemble learning involves combining multiple predictions derived by different techniques in order to create ...

5967 sym R (2695 sym/10 pcs)

Analyzing Federal Bailout Recipients in R

19.01.2012

I was searching for open data recently, and stumbled on Socrata. Socrata has a lot of interesting data sets, and while I was browsing around, I found a data set on federal bailout recipients. Here is the data set. However, data sets on Socrata are not always the most recent versions, so I followed a link to the data source at Propublica, where I ...

4885 sym R (1809 sym/10 pcs) 6 img

An Intro to Ensemble Learning in R

19.01.2012

Introduction This post incorporates parts of yesterday’s post about bagging. If you are unfamiliar with bagging, I suggest that you read it before continuing with this article. I would like to give a basic overview of ensemble learning. Ensemble learning involves combining multiple predictions derived by different techniques in order to creat...

6013 sym R (2695 sym/10 pcs)