Publications by Joseph Rickert

Baseball, T-tests and statistical surprises


Are MLB players better hitters now than they were 20 years ago? Revolution Analytics' Joseph Rickert uses R to take a look at the data, and offers an instructive lesson in checking your assumptions for statistical tests in the process — Ed. Data are everywhere – but, even for simple things, I still seem to spend a too much time surfing the we...

New functions for linear model inference in Revolution R Enterprise 4.3


The latest release of Revolution R Enterprise shows how Revolution Analytics’ package for big data, RevoScaleR, is continuing add new capabilities for Big Data statistics. RevoScaleR removes the limits on the size of the data that can be processed in R through the use of the highly efficient .Xdf binary file format. Xdf stores data by rows with...

K-Means Clustering on Big Data


In this post Joseph Rickert demonstrates how to build a classification model on a large data set with the RevoScaleR package. A script file for use with Revolution R Enterprise to recreate the analysis below is at the end of the post, and can also be downloaded here — ed. The k-means (Lloyd) algorithm, an intuitive way to explore the structure ...

A Work of Art: Efron on Bayesian Inference


(Contributing blogger Joseph Rickert reports from the Stanford University Statistics Seminar series – ed.) Stanford University is very gracious about letting the general public attend many university events. Yesterday, it caught my eye that Bradley Efron was going to speak on Bayesian inference and the parametric bootstrap at the weekly Statist...

Where to find data to use with R


(Contributing blogger Joe Rickert has put together a fantastic list of data sources suitable for use with R. If you're looking for data to use in the Applications of R Contest — entries close October 31 — this is a great resource for you — Ed.) Hardly a day goes by without someone or something reminding me that we are drowning in a sea of d...

ACM Data Mining Camp 2011: Report


(By Joseph Rickert.) In San Jose topics like big data, map reduce, predictive models, mobile analytics and crowdsourcing draw a crowd even on a Saturday. So it turned out that the ACM data Mining Camp and “un-conference” was a very “happening” way to spend a Saturday. Over 500 people attended the event at the Ebay “Town Hall” on North...

Review of "The Art of R Programming" by Norman Matloff


By Joseph Rickert Anyone seeking to learn R faces two major challenges: (1) learning how to swim in the sea of information: R packages, books, websites, blog posts, message boards etc. that threatens to drown a newbie and (2) and coming to grips with the structure, syntax and features of the language itself. Having some idea of what one wants to ...

The Bay Area R User Group Meeting on Data Mining with R


By Joseph Rickert Put up a poster that says something like “Data Mining with R” anywhere in the Bay Area and you will surely draw a crowd. But it was still a bit of a surprise that the monthly meeting of the Bay Area R User’s group was so well attended. At one point there were 160 people on the meetup list signed-up to attend the event, and...

Review of ‘R in Action’ by Robert I. Kabacoff


By Joseph Rickert Yesterday, the cosmic randomizer placed me next to a newly minter lawyer in a crowed Los Gatos coffee shop. In three minutes of conversation I learned that that the fellow was interested in corporate law, was about to take a job that would give him a seat in the great VC/start-up game and that he had some understanding of statis...

Coefplot: New Package for Plotting Model Coefficients


By Joseph Rickert Even to the practiced eye, looking at coefficients in R model summaries can be tedious. And, capturing information about the significance of coefficients from scores or maybe even hundreds of models in a way that makes writing the final report a bit easier is a time consuming and thankless task. Of course, once you know what you...

