Publications by stathack

Converting cross sectional data with dates to weekly averages in R.

30.05.2012

I was recently confronted with a problem where I had to compare two very different data sets. The problem was that one data set was observed cross sectional data with dates over the course of three months and the other was weekly averages during those same three months.  After a bit of research, I discovered that there is very simple way to conv...

2092 sym 4 img

Fun with geocoding and mapping in JGR

31.07.2012

For a recent project I had to do some mapping of addresses, but I didn’t have there lat/lons do use the Deducer and DeducerSpatial packages in R JGR.  After frustrating myself trying to adapt this code from stackoverflow.com, I found a much easier way of geocoding using the dismo and XML packages in R. First you need to have the complete addre...

2532 sym 6 img

Querying a database from within R

18.08.2012

For a while now I have been contemplating pulling data from our postgreSQL db directly from R, but just never actually pulled the trigger until today.  What I found was that it was a lot easier than I ever could have imagined.  My laptop was already on the VPN, so I decided to try it locally before deploying our R studio server.  After a bit o...

1303 sym 4 img

Presidential Candidate Sentiment Analysis

07.10.2012

After watching the Presidential debates and hearing all the opinions on how the candidates performed, I got the hair brained idea of creating a simple function that would do automate the pulling down of tweets for each candidate, analyze the positivity or negativity of tweets, and then graph them out. This project turned out to be a lot easier th...

2457 sym 6 img

Minute by Minute Twitter Sentiment Timeline from the VP debate

12.10.2012

Click on above graph to enlarge. Background The data for this graph was collected automatically every ~60 seconds of the VP debate on 10/11/2012, with an ending aggregate sample size of 363,163 tweets.  From this dataset duplicate tweets were removed (because of bots), which gave a final dataset of 81,124 remaining unique tweets (52,303-Biden, 2...

2871 sym 6 img

Twitter Analysis of the US Presidential Debate

17.10.2012

The following are word clouds of tweets for each candidate from the October 16, 2012 debate with the bigger words the more often they were used in tweets (click on each word cloud to enlarge): And the net-negative posts for each candidate: Please note that the bigger the word is in the word cloud the more often it was used. The R code for creat...

1698 sym 12 img

Top Facebook Posts During the US Presidential Debate

22.10.2012

The following data was collected during the Presidential Debate on the 22nd of October by tapping into the Facebook social graph API using R. The top three posted links during the debate for each candidate are: Obama- #1     http://bit.ly/QCODJg #2     http://bit.ly/RXstnm #3    http://bit.ly/P8MmJ1 Romney- #1    http://bit.ly/zDdsKf #2 �...

2569 sym 4 img

Building a Simple Web App using R

13.11.2012

I’ve been interested in building a web app using R for a while, but never put any time into it until I was informed of the Shiny package.  It looked too easy, so I absolutely had to try it out. First you need to install the package from the command line . options(repos=c(RStudio="http://rstudio.org/_packages", getOption("repos"))) install.pac...

2105 sym

Mapping Current Average Price Per Sqft for Rentals by Zip in San Fran

25.11.2012

My company, Kwelia, is sitting on mountains of data, so I decided to try my hand at mapping.  I have played around with JGR but it’s just too buggy, at least on my mac, so I went looking for other alternatives and found a good write up here.  I decided on mapping prices per sqft for apartment rentals by zip codes in the bay area because we ar...

2475 sym 8 img

Opening Large CSV Files in R

26.12.2012

Before heading home for the holidays, I had a large data set (1.6 GB with over 1.25 million rows) with columns of text and integers ripped out of the company (Kwelia) Database and put into a .csv file since I was going to be offline a lot over the break. I tried opening the csv file in the usual way: all <- read.csv("file.csv") However it neve...

1386 sym R (152 sym/2 pcs) 4 img