Publications by Ryan
Anecdotal Evidence that Facebook Stores all Clicks?
This is not really news. A few months ago, news broke that Facebook recorded each user’s clicks and profile views in a database. Of course, I am not at all surprised. I would be more surprised if they didn’t store every single click. By now, most people have some sense as to how Facebook’s recommendation system works. It typically performs ...
4829 sym 16 img
Some LaTeX Gems – Part 1: TikZ, Loops and more
This logo means that the blog post is about something I have found interesting, but does not apply directly to the exact purpose of this blog. Note: These commands have been tested in pdflatex. I am not sure if they work in other distributions. Over the past couple of months, I have been assisting with editing some papers and also doing some pro...
8554 sym R (2490 sym/8 pcs) 26 img
My Experience at Hadoop Summit 2010 #hadoopsummit
This week I had the opportunity the trek up north to Silicon Valley to attend Yahoo’s Hadoop Summit 2010. I love Silicon Valley. The few times I’ve been there the weather was perfect (often warmer than LA), little to no traffic, no road rage and people overall seem friendly and happy. Not to mention there are so many trees it looks like a for...
8631 sym 4 img
Taking R to the Limit, Part I – Parallelization in R
Tuesday night I had the opportunity to present on high performance computing in R, and the Los Angeles R Users’ Group. There was so much to talk about that I had to split my talk into two parts. The first part was parallelization and the second part will be big data (and a bit left over from parallelization including Hadoop). My slides are post...
1305 sym 2 img
Are MLB Games Getting Longer?
On July 29, 2010, I had a flight from Denver to Cincinnati. About an hour before boarding, I went to ESPN’s website and found a new article by Bill Simmons, a.k.a The Sports Guy (@sportsguy33 on Twitter). The basic premise of this article is that a core group of fans is losing interest in Red Sox games this season. So he decides to assign...
6124 sym R (1954 sym/1 pcs) 24 img
Apologies and Style Guides
I have to say that it’s pretty exciting to watch your blog go from a few hits over its lifetime to getting almost 200 in a single day. I am currently negotiating with Google over the purchase of this blog. Or maybe not. Again, thanks be to @revodavid for posting to the Revolution Analytics Blog. Anyway, I just wanted to apologize for th...
1424 sym 16 img
Goals per Game in MLS
I promised something related to Major League Soccer and here it is. Caveat: It’s not much. Why so sparse? (1) The data is a bit messy due to teams folding, expansion, name changes, etc. (2) I was backpacking all weekend and didn’t have time to work on this side project. Yes, I have a real job and working during the work week is ...
3473 sym 20 img
A Rule Change in Major League Soccer?
I have to admit that working with my Major League Soccer data set has been slow going. There are a few reasons: (1) I have a full-time job at the National Renewable Energy Lab and (2) the data isn’t quite as “rich” as I initially thought. As an example, the MLS site doesn’t list the wins and losses for each team by year. That seem...
4299 sym 12 img
Taking R to the Limit: Large Datasets; Predictive modeling with PMML and ADAPA
During the first part of our meeting, Ryan Rosario presented on the topic of large datasets in R. Video, slides and code of the talk “Taking R to the Limit: Large Datasets” by Ryan Rosario at the Los Angeles area R Users Group in August 2010 are below.Video SlidesSlides are also available for PDF download here. R code is available here. More ...
944 sym 2 img
Using XML package vs. BeautifulSoup
A while back I posted something about scraping a webpage using the BeautifulSoup module in Python. One of the comments to that post was by Larry — a blogger over at IEORTools — suggesting that I take a look at the XML library in R. Given that one of the points of this blog is to become more familiar with some of the R tools, it seemed lik...
3916 sym 6 img