Publications by Andrew Landgraf
Quick Post About Getting and Plotting Polls in R
With the election nearly upon us, I wanted to share an easy way I just found to download polling data and graph a few with ggplot2. dlinzer at github created a function to download poll data from the Huffington Post’s Pollster API.The default is to download national tracking polls from the presidential election. After sourcing the f...
2130 sym R (1362 sym/3 pcs) 6 img
Factor Analysis of Baseball’s Hall of Fame Voters
Factor Analysis of Baseball’s Hall of Fame Voters Recently, Nate Silver wrote a post which analyzed how voters who voted for and against Barry Bonds for Baseball's Hall of Fame differed. Not surprisingly, those who voted for Bonds were more likely to vote for other suspected steroids users (like Roger Clemens). This got me thinking that this wo...
4139 sym R (2168 sym/7 pcs) 14 img
Restricted Boltzmann Machines in R
Restricted Boltzmann Machines (RBMs) are an unsupervised learning method (like principal components). An RBM is a probabilistic and undirected graphical model. They are becoming more popular in machine learning due to recent success in training them with contrastive divergence. They have been proven useful in collaborative filtering, being one...
3348 sym R (3361 sym/3 pcs) 2 img
Copying Data from Excel to R and Back
A lot of times we are given a data set in Excel format and we want to run a quick analysis using R’s functionality to look at advanced statistics or make better visualizations. There are packages for importing/exporting data from/to Excel, but I have found them to be hard to work with or only work with old versions of Excel (*.xls, ...
1930 sym R (281 sym/2 pcs) 2 img
What Is the Probability of a 16 Seed Beating a 1 Seed?
Note: I started this post way back when the NCAA men’s basketball tournament was going on, but didn’t finish it until now. Since the NCAA Men’s Basketball Tournament has moved to 64 teams, a 16 seed as never upset a 1 seed. You might be tempted to say that the probability of such an event must be 0 then. But we know better than that.In this...
5971 sym R (4551 sym/8 pcs) 14 img
Downloading and Analyzing CD1025’s Playlist
CD1025 is an “alternative” radio station here in Columbus. They are one of the few remaining radio stations that are independently owned and they take great pride in it. For data nerds like me, they also put a real time list of recently played songs on their website. The page has the most recent 50 songs played, but you can also click on “O...
2813 sym R (3813 sym/7 pcs) 6 img
When Did CD102.5 Book the Summerfest Artists?
<p>Loading …</p> Related To leave a comment for the author, please follow the link and comment on their blog: Statistically Significant. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your conten...
424 sym
Top Songs by Artist on CD102.5 in 2013
In a previous post, I showed you how to scrape playlist data from Columbus, OH alternative rock station CD102.5. Since it’s the end of the year and best-of lists are all the fad, I thought I would share the most popular songs and artists of the year, according to this data. In addition to this, I am going to make an interactive grap...
2191 sym R (3603 sym/7 pcs) 6 img
Yet Another Baseball Defense Statistic
Fangraphs recently published an interesting dataset that measures defensive efficiency of fielders. For each player, the Inside Edge dataset breaks their opportunities to make plays into five categories, ranging from almost impossible to routine. It also records the proportion of times that the player successfully made the play. With this data, w...
8005 sym 10 img
Time Stacking and Time Slicing in R
Time lapses are a fun way to quickly show a long period of time. They typically involve setting up your camera on a tripod and taking photos at a regular interval, like every 5 seconds. After all the photos have been taken, they are combined into a movie at a much faster rate, for example 30 frames per second. Time stacking is a way to combine a...
3007 sym 10 img