Publications by Vik Paruchuri

Analyzing Federal Government Bailout Recipients in R

19.01.2012

I was searching for open data recently, and stumbled on Socrata. Socrata has a lot of interesting data sets, and while I was browsing around, I found a data set on federal bailout recipients. Here is the data set. However, data sets on Socrata are not always the most recent versions, so I followed a link to the data source at Propub...

4929 sym R (2492 sym/10 pcs) 6 img

R Regression Diagnostics Part 1

20.01.2012

Linear regression can be a fast and powerful tool to model complex phenomena. However, it makes several assumptions about your data, and quickly breaks down when these assumptions, such as the assumption that a linear relationship exists between the predictors and the dependent variable, break down. In this post, I will introduce so...

3067 sym R (264 sym/3 pcs) 4 img

Analyzing US Government Contract Awards in R

23.01.2012

As I was exploring open data sources, I came across USA spending. This site contains information on US government contract awards and other disbursements, such as grants and loans. In this post, we will look at data on contracts awarded in the state of Maryland in the fiscal year 2011, which is available by selecting “Maryland” ...

4434 sym R (1823 sym/6 pcs) 6 img

Parallel R Model Prediction Building and Analytics

26.01.2012

Modifying R code to run in parallel can lead to huge performance gains. Although a significant amount of code can easily be run in parallel, there are some learning techniques, such as the Support Vector Machine, that cannot be easily parallelized. However, there is an often overlooked way to speed up these and other models. It inv...

4137 sym R (1338 sym/8 pcs)

Using LaTeX, R, and Sweave to Create Reports in Windows

30.01.2012

LaTeX is a typesetting system that can easily be used to create reports and scientific articles, and has excellent formatting options for displaying code and mathematical formulas. Sweave is a package in base R that can execute R code embedded in LaTeX files and display the output. This can be used to generate reports and quickly fix errors whe...

4050 sym 6 img

Monitoring Progress Inside a Foreach Loop

09.02.2012

The foreach package for R is excellent, and allows for code to easily be run in parallel. One problem with foreach is that it creates new RScript instances for each iteration of the loop, which prevents status messages from being logged to the console output. This is particularly frustrating during long-running tasks, when we are of...

1781 sym R (982 sym/3 pcs)

Loading and/or Installing Packages Programmatically

08.05.2012

In R, the traditional way to load packages can sometimes lead to situations where several lines of code need to be written just to load packages. These lines can cause errors if the packages are not installed, and can also be hard to maintain, particularly during deployment. Fortunately, there is a way to create a function in R that ...

3485 sym R (933 sym/8 pcs)

Mapping US Radiation Levels in R

08.05.2012

I have posted previously about the open data available on Socrata (https://opendata.socrata.com/), and I was looking at the site again today when I stumbled upon a listing of levels of various radioactive isotopes by US city and state. The data is available at https://opendata.socrata.com/Government/Sorted-RadNet-Laboratory-Analysis/...

6192 sym R (2953 sym/11 pcs) 8 img

Predicting the NBA Finals with R

30.05.2012

This is the initial post about the algorithm. See updates 1, 2, and 3 for more. The algorithm is currently 4-2 in the playoffs!OverviewI was struck by Martin O’Leary’s recent post on predicting the Eurovision finals, which led me to decide that I would try to predict NBA games using mathematical models. As the finals are ongoin...

6743 sym 16 img

Predicting NBA Playoff Games – Results and Update 1

01.06.2012

Game ResultsI recently made a post about developing an algorithm to predict the NBA playoffs, and I concluded with 2 predictions. Although Miami beat the Celtics to make my algorithm 1-0 in terms of predictions, it fell to 1-1 when the Thunder beat the Spurs. So, we are now at .500 . Considering that the algorithm was about 61.5% accurate over...

3559 sym 8 img