Publications by David Smith

See the wind

30.03.2012

The image below isn't a bearskin rug in the shape of the USA. It's fact, it's a visualization of the wind flowing over the United States, as of 4PM EDT today, March 30. You can click through to see the current wind conditions, based on latest data from the National Digital Forecast Database. But more importantly, as long as you have a modern bro...

1229 sym 2 img

3-D graphing with Google

02.04.2012

You probably already knew that you can draw mathematical equations in Google by typing the equation into the search box. For example, here's the Standard Normal density function: I can't find a way to embed the graph directly, but if you click on it you'll find it's interactive: you can inspect points, zoom in/out etc. You can create a similar c...

1340 sym R (24 sym/1 pcs) 4 img

Marketing optimization with LityxIQ

03.04.2012

Marketing is one of the pioneering domains when it comes to applications of predictive analytics to Big Data. (For example, how Target used statistical modeling to predict demographic attribues of customers, like pregnancy, to target coupons.) To get such powerful insights into the hands of marketers, DC-area company LityxIQ provides a cloud-bas...

2864 sym

How R finds objects (or, what that :: operator is for)

04.04.2012

Most of the time when we're programming in R, we don't think about how R gets from an object name (say, “stdev”) to what it represents (a function to calculate standard deviation, perhaps). If you're writing functions, you've probably know about R's lexical scoping. And if you use a lot of packages, you probably know about the search list, th...

2454 sym 2 img

Compete in the Data Science Hackathon, April 28

05.04.2012

All around the world at noon GMT on April 28, data scientists around the world will compete in the world's first one-day International Data Science Hackathon, organized by Data Science London. Participants will receive a data set at the beginning of the event, and work in teams of 3-5 over the following 24 hours to create the best predictive m...

1325 sym

The race for speed at the data layer

06.04.2012

The competition amongst database vendors to create the fastest, most powerful “data layer” — the hardware and software to provide storage for Big Data with high-performance data processing — is clearly heating up. The Netezza appliance has been so successful that IBM has been racing to keep up with demand. SAP is also seeing success with...

3641 sym

R at the Consumer Financial Protection Bureau

10.04.2012

The O'Reilly Radar blog has a lengthy and very interesting interview with the lead and deputy CIOs of the Consumer Financial Protection Bureau, the new US government agency devoted to consumer protections in the financial markets. In that interview, they talk about the many open-source tools used in the agency (and the parent Treasury Departmen...

2384 sym

The age of sail, visualized

11.04.2012

As anyone who's ever played Civilization[*] knows, the advent of sailboats capable of crossing the oceans leads to an explosion of exploration, commerce and social development. And with the visualization below, you can see that explosion in action: Ben Schmidt used the R language and data recorded in by hand in ship logs[**] to create the anima...

1991 sym

In case you missed it: March 2012 Roundup

12.04.2012

In case you missed them, here are some articles from March of particular interest to R users. New features in the latest version of ggplot2 include choropleths, violin plots, and improved annotations. A video demonstration of big-data Naive Bayes and Classification Tree models with Revolution R Enterprise for IBM Netezza. A collection of two-mi...

2670 sym

R’s continued growth in academia

13.04.2012

Bob Muenchen has recently updated his report on the popularity of statistical software. With the updated analysis, we see that the R community remains as strong as ever: the number of contributed R packages continues its exponential growth rate, R maintains its dominance in online discussion, and has 20x the content of other statistics packages o...

1605 sym 4 img