Publications by David Smith

Dress your R code for the Web with Pretty R

04.11.2010

If you have some R code to include in a document, especially a Web-based document like a blog post, the new “Pretty R” feature on inside-R.org can help you make it look its best. Given some raw R code, it will create a HTML version of the code, adding syntax highlighting elements and links. Functions, strings, comments and literals are all co...

1755 sym R (607 sym/1 pcs) 2 img

ACM Data Mining Camp 3

05.11.2010

The San Francisco Bay Area chapter of the ACM is will hold its third data mining camp next Saturday (November 13) at the Ebay campus in San José. Like the previous camps, this will be a one-day “unconference”-style event, with an agenda developed ad-hoc on the day according to the interests of the attendee. With data scientists from the like...

1097 sym

Because it’s Friday: Epidemiology in 1632

05.11.2010

I first got interested in epidemiology when I saw the famous John Snow chart (in a Tufte book, I think?) which pinpointed the pump which caused the 1854 cholera outbreak in London. For some reason I'd gotten the impression that this was essentially the birth of epidemiology as a discipline, but it's actually been around a lot longer than that. 20...

1251 sym 2 img

The Dataists answer your questions

08.11.2010

The fine bloggers (and R experts) at the Dataists have volunteered to answer questions about data analysis on Reddit: A few months ago, a group of likeminded folks in New York and the San Francisco Bay area decided it was time to start a blog about data, and we can up with the Dataists. Since then we thought about a taxonomy of data science, c...

1249 sym

Using R and Hadoop to analyze VOIP data

08.11.2010

Last month, the newest member of Revolution's engineering team, Saptarshi Guha, gave a presentation at Hadoop World 2010 on using R and Hadoop to analyze 1.3 billion voice-over-IP packets to identify calls and measure call quality. Saptarshi, of course, is the author of RHIPE, which lets R programmers write map-reduce algorithms in the Hadoop fra...

1199 sym

New R User Group in Houston

09.11.2010

The latest local R user group to form is located in Houston, Texas. The first meeting of the Houston R Users Group is tonight at Rice University (in conjunction with the Houston chapter of the ASA). R hackr (typo intended!) Hadley Wickham will be giving a presentation on writing R packages, and you can check out the slides on his SlideShare page....

881 sym

Promote your favorite R functions

09.11.2010

The 27 base and recommended libraries of the standard R 2.12 distribution together contain 3556 functions (you can check using the code posted after the jump). Many of the functions are commonly used: c, data.frame, rnorm, lm. But some of those functions, while being extremely useful, may be less well known to many R users. Some examples I'd wish...

1414 sym R (569 sym/1 pcs)

R co-creator Ross Ihaka wins Lifetime Achievement Award in Open Source

09.11.2010

The co-creator of R, University of Auckland Associate Professor of Statistics Dr. Ross Ihaka, was yesterday awarded the Catalyst Lifetime Achievement in Open Source Award at the 2010 New Zealand Open Source Awards. From the announcement: Dr. Ihaka is one of the originators of the world-renown ‘R’ programming language and software environme...

1099 sym

Tell Forbes how you use R

10.11.2010

Steve McNally of the Forbes Mean Business blog says R is a name you need to know for 2011. He cites some great examples of R in action: Facebook has used R to figure out that “just two data points are significantly predictive of whether a user remains on Facebook: (i) having more than one session as a new user, and (ii) entering basic profile ...

1624 sym

Help Mozilla visualize how people use Firefox

11.11.2010

You might recall we posted a couple of weeks ago this chart summarizing the times of the day Firefox users switch on Private Browsing mode: The chart, based on data from the Mozilla Test Pilot program tells an interesting story about the habits of Web users. But what other interesting stories could be told, to reveal more insights into how peopl...

2073 sym 2 img