Publications by David Smith

New R User Groups in Ankara, Toronto


Two new local R user groups to report this week. In Turkey, the Ankara R Users Group has just started up. No meetings are scheduled yet, so be sure to suggest a meeting time/location when you sign up. The Toronto-based R Matlab Users group focuses on financial services applications. Created by Bryan Downing (who also produces the QuantLabs blog),...

1108 sym

Figuring an exchange rate for sports scores


While the US's Major League Soccer is using advanced analytics to analyze ball movement and improve team composition, they might want to think about a smaller, but possibly more impactful, goal for analytics. Like, how to explain to an American audience what a 1-2 game means to a basketball or baseball fan not familiar with scoring in the beautif...

1524 sym

Orbitz and the Macs: Signals, not segmentation


By now you've probably heard about the fact that Orbitz users accessing the site via Macs are seeing more expensive hotel options when they search. But it seems worth clearing up a couple of fallacies. First, it's not as if the same hotel room is being offered at a higher prices to Mac users. (So no, using Windows to access Orbitz won't get you a...

2273 sym

Big Data Generalized Linear Models with Revolution R Enterprise


R''s glm function for generalized linear modeling is very powerful and flexible: it supports all of the standard model types (binomial/logistic, Gamma, Poisson, etc.) and in fact you can fit any distribution in the exponential family (with the family argument). But if you want to use it on a data set with millions or rows, and especially with mor...

1909 sym 2 img

Trying for a baby? Here’s how long it might take.


Wanting to start a family the natural way? For a healthy 45-year-old woman, you may be in for a five-year wait. That's the conclusion of Richie Cotton, a UK-based data scientist, who discovered when he and his girlfriend wanted to start a family that statistics on how long it takes to get pregnant are hard to come by. The National Health Service ...

2059 sym 2 img

A big list of the things R can do


R is an incredibly comprehensive statistics package. Even if you just look at the standard R distribution (the base and recommended packages), R can do pretty much everything you need for data manipulation, visualization, and statistical analysis. And for everything else, there's more than 5000 packages on CRAN and other repositories, and the big...

2233 sym

The role of Statistics in the Higgs Boson discovery


News is starting to leak that the Large Hadron Collider may have accomplished its primary mission of confirming the existence of the hypothesised and heretofore elusive subatomic particle, the Higgs Boson. And sure, billions of Euros worth of state-of-the-art high-energy machinery and an army of experimental and theoretical physicists probably ha...

1879 sym 4 img

A new open journal on Data Science


Springer has introduced a new open, peer-reviewed journal focused on Data Science: EPJ Data Science. What makes this a Data Science journal is novel uses of statistics, data analysis, computer techniques and public data sources to research a topic in another domain, rather than methodological research. Here are a few examples of the papers you'll...

1566 sym

New R User Group in Leipzig, Germany


Leipzig R Statistical Computing is the sixth local R user group in Germany, and has been holding meetings since February. In the next meeting on July 12, member Claudia Beleites will talk about her pacakges softclassval (for classifier performance measures) and hyperspec (for hyperspectral data). Leipzig R Statistical Computing R...

735 sym

Three hours of pure soccer emotion, visualized with R


The biggest prize in UK soccer, the Premier League Championship, is decided by a points system. Unlike most sports competitions, there's no final round or playoff series: once the regular round of games is complete, the team that has accumulated the most points (three for a win, and one for a draw) is the champion of English football. In the eve...

2354 sym 2 img