Publications by David Smith
Analyzing weblog data with R
The R-chart blog explains how to read a weblog file into R, so you can analyze traffic to a website. For example, here's a page request chart created with R: Now, charts like this are stock-in-trade for tools like Google Analytics, but this is still useful if you want to look at the performance of a site that hasn't been instrumented for analytic...
1156 sym 4 img
Creating beautiful maps with R
Spanish R user and solar energy lecturer Oscar Perpiñán Lamigueiro has written a detailed three-part guide to creating beautiful maps and choropleths (maps color-coded with regional data) using the R language. Motivated by the desire to recreate this graphic from the New York Times, Oscar describes how he creates similar high-quality maps usi...
2090 sym 8 img
Revolution Analytics at Strata 2012
One of my favourite conferences, Strata: Making Data Work, starts tomorrow in Santa Clara, CA. Revolution Analytics is a proud sponsor, and I'll be there with the team to listen to some great talks and to meet other R users at our booth in the exhibition hall. There will be several R-related talks and tutorials during the conference, including tw...
1744 sym
RHadoop updated: improved performance and more control
Revolution Analytics' open-source RHadoop project, which provides integration between R and Hadoop, has been updated with the release of version 1.2 of the “rmr” package. New in this version: support for binary I/O formats, which improves on the text-only interfact by allowing use of faster and more space-efficient data formats like R's nativ...
1130 sym
R integrated throughout the enterprise analytics stack
The past couple of years have seen a dramatic growth in the use of the R language in the enterprise. R has always been pervasive in academia for research and teaching in statistics and data science, and as new graduates trained in R have migrated to the workplace the demand for R in corporations has become more and more intense. Database vendor...
8435 sym
Webinar tomorrow: Big-data statistics with Revolution R with IBM Netezza
As explained in detail by Michele Chambers at the IBM Netezza blog, there are two keys to getting fast performance with statistical analysis on massive data sets with R: Massive parallelization: break the problem down into small pieces, and run them in parallel Bring the R engine to the data (not the other way around), to avoid data transfer del...
2639 sym
R turns 12; R 2.14.2 is out
As promised by the R Core Group, R 2.14.2 is out. This is the final patchlevel of the R 2.14.x series (R 2.15.0 is due on March 30), and so R 2.14.2 will be the R engine for the next release of Revolution R Enterprise in a couple of months. Today also marks the 12th anniversary since R 1.0.0 was released on February 29, 2000. If course, R had b...
1418 sym
Bad Science at Strata 2012
Ben Goldacre, the physician and biostatistician behind the always-excellent Bad Science column in the Guardian, gave a barnburner of a talk at Strata 2012 yesterday, “The Information Architecture of Medicine is Broken“. For anyone not aware of the problems caused by publication bias in clinical trials (for example, ineffective drugs with a ...
1335 sym
New data visualization features in ggplot2 update
Hadley Wickham has just released an update to the ggplot2 graphics package for R. Version 0.9.0 significantly speeds up the process of rendering graphics, and the documentation is much improved (including the addition of many new examples). This update also adds a bunch of new features, which are documented in this 40-page “changes and addit...
1760 sym 8 img
Dr Sanjiv Das presents "Using R for Analyzing Loans, Portfolios and Risk"
In a free webinar tomorrow at 10AM Pacific, Professor of Finance Dr Sanjiv Das will present, “Using R for Analyzing Loans, Portfolios and Risk: From Academic Theory to Financial Practice“. I saw a version of Dr Das's talk a couple of months ago at the Bay Area R User Group meeting, and it was outstanding. I particularly recall his analysis t...
2248 sym