Publications by Edwin Chen

Item-to-Item Collaborative Filtering with Amazon’s Recommendation System

14.02.2011

Introduction In making its product recommendations, Amazon makes heavy use of an item-to-item collaborative filtering approach. This essentially means that for each item X, Amazon builds a neighborhood of related items S(X); whenever you buy/look at an item, Amazon then recommends you items from that item’s neighborhood. That’s why when you s...

3363 sym

Prime Numbers and the Riemann Zeta Function

13.03.2011

Lots of people know that the Riemann Hypothesis has something to do with prime numbers, but most introductions fail to say what or why. I’ll try to give one angle of explanation. Layman’s Terms Suppose you have a bunch of friends, each with an instrument that plays at a frequency equal to the imaginary part of a zero of the Riemann zeta funct...

4126 sym 6 img

Layman’s Introduction to Random Forests

13.03.2011

Suppose you’re very indecisive, so whenever you want to watch a movie, you ask your friend Willow if she thinks you’ll like it. In order to answer, Willow first needs to figure out what movies you like, so you give her a bunch of movies and tell her whether you liked each one or not (i.e., you give her a labeled training set). Then, when you ...

4115 sym

Netflix Prize Summary: Factorization Meets the Neighborhood

13.03.2011

(Way back when, I went through all the Netflix prize papers. I’m now (very slowly) trying to clean up my notes and put them online. Eventually, I hope to have a more integrated tutorial, but here’s a rough draft for now.) This is a summary of Koren’s 2008 Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. Th...

6736 sym

Layman’s Introduction to Measure Theory

13.03.2011

Measure theory studies ways of generalizing the notions of length/area/volume. Even in 2 dimensions, it might not be clear how to measure the area of the following fairly tame shape: much less the “area” of even weirder shapes in higher dimensions or different spaces entirely. For example, suppose you want to measure the length of a book (so...

3744 sym 2 img

Netflix Prize Summary: Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights

13.03.2011

(Way back when, I went through all the Netflix prize papers. I’m now (very slowly) trying to clean up my notes and put them online. Eventually, I hope to have a more integrated tutorial, but here’s a rough draft for now.) This is a summary of Bell and Koren’s 2007 Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolat...

11509 sym

Topological Combinatorics and the Evasiveness Conjecture

13.03.2011

The Kahn, Saks, and Sturtevant approach to the Evasiveness Conjecture (see the original paper here) is an epic application of pure mathematics to computer science. I’ll give an overview of the approach here, and probably try to add some more information on the problem in other posts. tl;dr The KSS approach provides an algebraic-topological atta...

2693 sym

Counting Clusters

13.03.2011

Given a set of numerical datapoints, we often want to know how many clusters the datapoints form. Two practical algorithms for determining the number of clusters are the gap statistic and the prediction strength. Gap Statistic The gap statistic algorithm (from Estimating the number of clusters in a data set via the gap statistic) works as follows...

3187 sym 22 img

Eigensheep

13.03.2011

Aaron Koblin’s Sheep Market visualization is an awesome use of Mechanical Turk. But it’d be even more awesome if the grid were ordered, so inspired by the use of eigenfaces in facial recognition, I decided to try projecting the sheep onto two dimensions. Principal Sheep Components After screenshotting the first 50 sheep from the market and no...

1894 sym 20 img

A Kernel Density Approach to Outlier Detection

13.03.2011

I describe a kernel density approach to outlier detection on small datasets. In particular, my model is the set of prices for a given item that can be found online. Introduction Suppose you’re searching online for the cheapest place to buy a stats book you need for class. You initially find the following prices: $50 - Amazon $55 - Barnes & Nobl...

4838 sym R (230 sym/3 pcs) 112 img