Publications by David Smith

5000 R questions on stackoverflow.com

16.06.2011

The R tag on stackoverflow.com hit a milestone yesterday: 5000 questions about the R language. (The 5000th question was about the fortunes package, incidentally — thanks to Andrie de Vries for pointing this out on Twitter.) Stackoverflow.com continues to be an excellent resource for R users, and the number of questions (and answers!) continues ...

1010 sym

Where Ichiro Hits

16.06.2011

Google research scientist Peter Hauck used Weka and k-means cluster analysis to describe where Mariners right-fielder Ichiro favours hitting the baseball. He then used R to visualize the 6 clusters the k-means analysis identified: I sometimes find K-means clusting tough to explain as a statistical technique, but this makes for a great example: i...

963 sym 2 img

R in the Bioinformatics Knowledgeblog

21.06.2011

The Knowledge Blog progect is a new, light-weight way of publishing scientific, academic and technical knowledge on the web, across several scientific disciplines. One such discipline is bioinformatics, and the Bioinformatics Knowledgeblog contains useful scientific reference material for bioinformatics, including several resources for R users....

1025 sym

Five things Biologists should know about Statistics

21.06.2011

In a thoughtful blog post, Bioinformatician Ewan Birney (Head of Nucleotide Data at the European Bioinformatics Institute) talks about the importance of Statistics to biologists: Biology is really about stats. Indeed, the foundation of much of frequentist statistics – RA Fisher and colleagues – were totally motivated by biological problems....

1232 sym

Video: Two R talks from Hadley Wickham

22.06.2011

On his recent tour to the Bay Area, Hadley Wickham have two interesting R-related talks, for which video has been made available by Google Tech Talks. At the June Bay Area R User Group meeting, Hadley spoke on the future of interactive data visualization in R. Building on his experiences creating the ggplot2 package (which is still under developm...

1877 sym

Speed up R "for" loops 50x with Rcpp

23.06.2011

Christian Gunning has a great example of using Rcpp to speed up a for loop in R. For his agent-based simulation, Christian needed to repeatedly call the rbinom function in a loop. (Unfortunately, you can't pass a vector to the size argument, which would have solved the problem.) Using the aaply function (from the plyr package) took about 38 secti...

1390 sym

The R Journal: June 2011

24.06.2011

The latest issue of the R Journal is out, and as always includes many useful articles about using R and R packages. Articles in Volume 3/1 dive into topics including creating test for R packages with test_that; dealing with times, time zones, dates and holidays with timeDate; social network analysis of mailing lists through text mining; creating...

1454 sym

Benchmarking Revolution R for data mining

28.06.2011

The blog Heuristically Andrew puts Revolution R through its paces by running some benchmarks versus open-source R for data mining applications. The benchmarks set out to answer the following question: I recently upgraded my notebook (where I often use R for data mining) and was faced with two questions: for the fastest speed for building models,...

3197 sym 2 img

R 2.13.1 scheduled for July 8

30.06.2011

The R Core team announced today that the next update to R, version 2.13.1, will be released on July 8. Core team member Peter Dalgaard noted: The 2.13.0 release has been quite solid, but some people expect an x.y.1 to roll out on larger installations for the next academic year. Of course, there have also been a sampling of minor bug fixes. Look...

1529 sym

How to find R experts on LinkedIn

01.07.2011

If you're looking for connections with expertise in R programming, the new Skills and Expertise feature on LinkedIn makes it easy. Just visit the R Skills page for a list of R practitioners on LinkedIn. You can also add “R” to your own list of skills from the same page. You also might want to consider joining the R Project for Statistical Com...

800 sym