Publications by David Smith
A Napa Valley wine tasting map, made with R and ggmap
R has had a maps package available since the very early days. It's great for simple geographic maps, but the political boundaries can be out of date. For more detailed maps, you can also download shape files and use the sp package to draw borders directly. But for accurate and attractive maps of countries, roads and satellite imagery, nothing bea...
2949 sym R (386 sym/2 pcs) 2 img
Importing public data with SAS instructions into R
Many public agencies release data in a fixed-format ASCII (FWF) format. But with the data all packed together without separators, you need a “data dictionary” defining the column widths (and metadata about the variables) to make sense of them. Unfortunately, many agencies make such information available only as a SAS script, with the column i...
1877 sym
In case you missed it: June 2012 Roundup
In case you missed them, here are some articles from June of particular interest to R users. The FDA goes on the record that it's OK to use R for drug trials.A review of talks at the useR! 2012 conference.Using the negative binomial distribution to convert monthly fecundity into the chances of having a baby in a given time period.Some benchmark...
2526 sym
Napa Valley wine tasting map: interactive version
Got some great reactions to the Napa Valley wine tasting map made with the ggmap package I posted on Monday. A couple of people asked if similar maps could be made for other wine regions (like Australia's Hunter Valley, or the Walla Walla region in Washington): provided you have a list of winery addresses, tweaks to the same R script should work...
2026 sym 2 img
Applications of R at Google
At a talk I saw at the useR!2012 conference last month, Googler Karl Millar estimated that there are at least 200 active R users at Google, plus another 300+ occasional users participating in Google's internal R support list. But what are all these Google employees doing with R? A post from the Google Research team published on Google+ yesterday ...
2672 sym
Using integer programming in R to optimize cargo loads
Linear Programming is a mathematical technique used to find the values of some variables (within the bounds of some defined constraints) to find the maximum value of a quantity. For example, consider this problem from the FishyOperations blog: A trading company is looking for a way to maximize profit per transportation of their goods. The comp...
2421 sym 2 img
The R packages in a data scientist’s toolbox
John Myles White, self-described “statistics hacker” and co-author of “Machine Learning for Hackers” was interviewed recently by The Setup. In the interview, he describes his some of his go-to R packages for data science: Most of my work involves programming, so programming languages and their libraries are the bulk of the software I use...
1549 sym
Preparing public data for analysis with R
In most data science applications, preparing the data is at least half the job. Finding where the data lives, figuring out how to access it, finding the right records, filtering, cleaning and transforming the data … all of this has to be done before the statistical analysis can even begin. Fortunately, the R language has many tools for data pro...
1064 sym 2 img
Coke vs Soda vs Pop : Linguistic trends analyzed with Twitter and R
Growing up in Australia, for me a carbonated drink like Pepsi or Fanta or lemonade was always just a “soft drink”. (Also, 'lemonade' in Australia was something different to 'lemonade' in the US; it's something close to 7-Up.) So when I moved to Seattle, it was surprising to me that all such things were called “pop”. And then I travelling ...
1351 sym 2 img
R Journal, June 2012
The June 2012 issue of the R Journal, the peer-reviewed open-journal about R packages and applications of R, is now available. This issue includes articles about: Efficiently calling C functions from R without the need for wrapper code Using clusters of Macs running Apple Xgrid for parallel distributed processing with R Semi-automated text class...
1098 sym