Publications by David Smith
Winners of 2010 ggplot2 case study competition
The winners of this year's ggplot2 case study competition have been announced. I was honoured to be asked to be a judge of the competition this year, but it was a difficult job with so many excellent entries. In the end, the judging panel (which included Heike Hoffman and Hadley Wickham and me) selected three entries which each demonstrated the a...
2609 sym 6 img
An Old Wives Tale from the 2000 Census
With the data from the 2010 US Census to be published early next year, here's a cautionary tale from the 2000 Census. If you take a look at the ratio of numbers of men to women in the 5-Percent “PUMS” sample from the 2000 census over various ages, you'll see an odd spike near age 65: What causes this strange anomaly in the data? In the vide...
1442 sym 2 img
Hold on to your hats: it’s World Statistics Day!
Apparently today is the first ever World Statistics Day. I only knew about it because I'd seen a couple of passing references to it from the stats folks I follow on Twitter. But I guess this UN-sponsored event is a big deal, judging from the official website: The celebration of the World Statistics Day will acknowledge the service provided by the...
2597 sym
R is Hot: Part 3
This is Part 3 of a five-part article series, with new parts published each Thursday. You can download the complete article from the Revolution Analytics website. Power from EleganceIf the R movement has a genuine rock star, it’s probably Hadley Wickham. He’s an assistant professor and the Dobelman Family Junior Chair in Statistics at Rice Un...
4857 sym
A workflow for R
Writing an R script is one thing. Organizing your process: where to put the data, how to refer to files in scripts, how to run the scripts, and how to produce and collect and report the results; that's quite another. Every R user has their own workflow for doing data analysis with R, but the best workflows achieve the following goals: Transpare...
2625 sym
Because it’s Friday: Arthur C Clarke predicts the present
On the BBC Horizon programme in 1964, Arthur C Clarke made some predictions about the future. He prefaced his predictions with the following caveat: If, by some miracle, a prophet could describe the future exactly as it was going to take place, his predictions would sound so absurd, so farfetched, that everybody would laugh him to scorn. So what ...
1342 sym
The language of Statistics
R is the lingua franca of Statistics: R code and R packages is the means by which statisticians communicate ideas and methods for statistical analysis. The reasons why are discussed in this article, but it also begs the question: what's wrong with the spoken or written word? How Statistics and Probability relate to the English language is the sub...
1264 sym
Upcoming R courses from Statistics.com
The online training provider Statistics.com has three great courses based on R coming up in the next few months: Nov. 5 – Dec. 3: “Graphics in R,” with Paul MurrellNov. 20 – Dec. 18: Support Vector Machines in R” with Dr. Lutz HamelDec. 17 – Jan. 22: “Geostatistics in R” with Prof. David Unwin The courses take place online at�...
5564 sym
R nominated for best open-source project in New Zealand
The R project, born in New Zealand in 1993, has been nominated as the best open-source project in the New Zealand Open-Source Awards 2010. R's co-creator Ross Ihaka talks about the project in this article by the New Zealand Herald: Ross Ihaka from the University of Auckland started developing R 20 years ago, but it took off about a decade ago as ...
1358 sym
InfoWorld: R a programming language "on the rise"
In an article looking at once-niche programming languages that are now being deployed in businesses, R is named as one of 7 programming languages on the rise: R is another Swiss Army Knife of numerical and statistical routines for hacking through the big data sets — collections big enough that it might be better called a Swiss Army Machete. Lo...
1210 sym