Publications by David Smith
See the wind
The image below isn't a bearskin rug in the shape of the USA. It's fact, it's a visualization of the wind flowing over the United States, as of 4PM EDT today, March 30. You can click through to see the current wind conditions, based on latest data from the National Digital Forecast Database. But more importantly, as long as you have a modern bro...
1229 sym 2 img
3-D graphing with Google
You probably already knew that you can draw mathematical equations in Google by typing the equation into the search box. For example, here's the Standard Normal density function: I can't find a way to embed the graph directly, but if you click on it you'll find it's interactive: you can inspect points, zoom in/out etc. You can create a similar c...
1340 sym R (24 sym/1 pcs) 4 img
Marketing optimization with LityxIQ
Marketing is one of the pioneering domains when it comes to applications of predictive analytics to Big Data. (For example, how Target used statistical modeling to predict demographic attribues of customers, like pregnancy, to target coupons.) To get such powerful insights into the hands of marketers, DC-area company LityxIQ provides a cloud-bas...
2864 sym
How R finds objects (or, what that :: operator is for)
Most of the time when we're programming in R, we don't think about how R gets from an object name (say, “stdev”) to what it represents (a function to calculate standard deviation, perhaps). If you're writing functions, you've probably know about R's lexical scoping. And if you use a lot of packages, you probably know about the search list, th...
2454 sym 2 img
Compete in the Data Science Hackathon, April 28
All around the world at noon GMT on April 28, data scientists around the world will compete in the world's first one-day International Data Science Hackathon, organized by Data Science London. Participants will receive a data set at the beginning of the event, and work in teams of 3-5 over the following 24 hours to create the best predictive m...
1325 sym
The race for speed at the data layer
The competition amongst database vendors to create the fastest, most powerful “data layer” — the hardware and software to provide storage for Big Data with high-performance data processing — is clearly heating up. The Netezza appliance has been so successful that IBM has been racing to keep up with demand. SAP is also seeing success with...
3641 sym
R at the Consumer Financial Protection Bureau
The O'Reilly Radar blog has a lengthy and very interesting interview with the lead and deputy CIOs of the Consumer Financial Protection Bureau, the new US government agency devoted to consumer protections in the financial markets. In that interview, they talk about the many open-source tools used in the agency (and the parent Treasury Departmen...
2384 sym
The age of sail, visualized
As anyone who's ever played Civilization[*] knows, the advent of sailboats capable of crossing the oceans leads to an explosion of exploration, commerce and social development. And with the visualization below, you can see that explosion in action: Ben Schmidt used the R language and data recorded in by hand in ship logs[**] to create the anima...
1991 sym
In case you missed it: March 2012 Roundup
In case you missed them, here are some articles from March of particular interest to R users. New features in the latest version of ggplot2 include choropleths, violin plots, and improved annotations. A video demonstration of big-data Naive Bayes and Classification Tree models with Revolution R Enterprise for IBM Netezza. A collection of two-mi...
2670 sym
R’s continued growth in academia
Bob Muenchen has recently updated his report on the popularity of statistical software. With the updated analysis, we see that the R community remains as strong as ever: the number of contributed R packages continues its exponential growth rate, R maintains its dominance in online discussion, and has 20x the content of other statistics packages o...
1605 sym 4 img