Publications by David Smith

How Orbitz uses Hadoop and R to optimize hotel search


Positional bias — the tendency for users to preferentially select results in the first few positions of a search — is a big issue for all kinds of search engines. But for online travel site Orbitz the stakes are higher than for a traditional Web search engine: if a customer chooses the first-listed hotel in a search for accommodations, but wi...

2427 sym 4 img

Forbes: R is a name you need to know in 2011


The December 20 issue of Forbes magazine, on newsstands now, includes a column about R on page 128 as part of the “Name You Need to Know in 2011” feature. It's basically an excerpt from this blog post by Steve McNally and its comments, and includes quotes from Norman Nie of Revolution Analytics, Bill Alpert of Barron's, and Brandon Witcher. T...

830 sym

Citizen Data Journalism: Mexico Homicides


I've recently praised some mainstream media outlets like the New York Times and New Scientist for leading the charge on data journalism. But you don't need to be a large organization to find news in data. With open data sources, and open-source data analysis tools, individuals can make newsworthy discoveries. Diego Valle-Jones has been investigat...

1547 sym 2 img

Did you feel that?


There was a small earthquake in northern England on Tuesday. Barry Rowlingson felt the quake (it rattled the photographs on his wall), but didn't know how big of a quake it was because he didn't know how close he was to the epicentre. The British Geological Survey hadn't yet announced the quake, but did give access to seismograph readings, which ...

1279 sym 2 img

Travel grants and prizes for R/Finance 2011


If you've been thinking about heading to Chicago in April for the R/Finance conference, here's another reason to go: posting for the committee, Dirk Eddelbuettel announced last week that thanks to a favourable response from sponsors[*], the conference organizers can now offer: a competition for best paper, which given the focus of the conference...

1125 sym

Analysis of Facebook status updates


The Facebook Data Team has published an analysis of the status updates of Facebook users, by categorizing words according to the 68 categories of the Linguistic Inquiry and Word Count Dictionary, and tabulating the frequencies of their use. It's fairly interesting to see this kind of analysis applied to Facebook, but unfortunately doesn't reveal ...

1674 sym 2 img

R Packages for Social Search


Jesse Bridgewater works on “social search awesomeness” for the Bing search engine, and is setting up his dev environment with the necessary tools including python, vim, and R. Jesse has shared a handy script he uses to install all the specialty packages he uses for his data analysis. This is a handy script to modify for your own purposes, bu...

1235 sym

Revolutions blog: 2010 statistics


Since it's the end of the year, and since this is a statistics blog, I thought I'd pull some data from the blog server and run some number on the blog itself. Overall, the blog has doubled the average number of daily visitors and pageviews compared to 2009. The number of pageviews varies quite a lot, as you can see from the kernel density estimat...

3989 sym R (130 sym/1 pcs) 2 img

First meeting of Toronto R User Group this Friday


The Greater Toronto Area R User Group is having its inaugural meeting this Friday, January 7. The meeting will feature Ben Bolker, professor and author of Ecological Models and Data in R, who will speak on Generalized Linear Models. It's also your opportunity to join the group and help define its future path, so if you're in the area why not joi...

799 sym

The R Journal: December 2010


Issue 2 of The R Journal (the peer-reviewed journal devoted to R) was published over the Christmas break. In addition to news about the latest release of R, it also includes contributed articles on using GPU processing to fit Bayesian models in R, processing text data in R, solving differential equations in R, and much more. Follow the link below...

812 sym