Publications by Joseph Rickert

Baseball, T-tests and statistical surprises

31.03.2011

Are MLB players better hitters now than they were 20 years ago? Revolution Analytics' Joseph Rickert uses R to take a look at the data, and offers an instructive lesson in checking your assumptions for statistical tests in the process — Ed. Data are everywhere – but, even for simple things, I still seem to spend a too much time surfing the we...

3640 sym R (2071 sym/3 pcs) 4 img

New functions for linear model inference in Revolution R Enterprise 4.3

26.04.2011

The latest release of Revolution R Enterprise shows how Revolution Analytics’ package for big data, RevoScaleR, is continuing add new capabilities for Big Data statistics. RevoScaleR removes the limits on the size of the data that can be processed in R through the use of the highly efficient .Xdf binary file format. Xdf stores data by rows with...

3073 sym R (1860 sym/5 pcs) 1 tbl

K-Means Clustering on Big Data

07.06.2011

In this post Joseph Rickert demonstrates how to build a classification model on a large data set with the RevoScaleR package. A script file for use with Revolution R Enterprise to recreate the analysis below is at the end of the post, and can also be downloaded here — ed. The k-means (Lloyd) algorithm, an intuitive way to explore the structure ...

4738 sym R (901 sym/1 pcs) 4 img

A Work of Art: Efron on Bayesian Inference

06.10.2011

(Contributing blogger Joseph Rickert reports from the Stanford University Statistics Seminar series – ed.) Stanford University is very gracious about letting the general public attend many university events. Yesterday, it caught my eye that Bradley Efron was going to speak on Bayesian inference and the parametric bootstrap at the weekly Statist...

2712 sym

Where to find data to use with R

11.10.2011

(Contributing blogger Joe Rickert has put together a fantastic list of data sources suitable for use with R. If you're looking for data to use in the Applications of R Contest — entries close October 31 — this is a great resource for you — Ed.) Hardly a day goes by without someone or something reminding me that we are drowning in a sea of d...

4714 sym

ACM Data Mining Camp 2011: Report

18.10.2011

(By Joseph Rickert.) In San Jose topics like big data, map reduce, predictive models, mobile analytics and crowdsourcing draw a crowd even on a Saturday. So it turned out that the ACM data Mining Camp and “un-conference” was a very “happening” way to spend a Saturday. Over 500 people attended the event at the Ebay “Town Hall” on North...

4452 sym

Review of "The Art of R Programming" by Norman Matloff

29.11.2011

By Joseph Rickert Anyone seeking to learn R faces two major challenges: (1) learning how to swim in the sea of information: R packages, books, websites, blog posts, message boards etc. that threatens to drown a newbie and (2) and coming to grips with the structure, syntax and features of the language itself. Having some idea of what one wants to ...

3895 sym

The Bay Area R User Group Meeting on Data Mining with R

16.12.2011

By Joseph Rickert Put up a poster that says something like “Data Mining with R” anywhere in the Bay Area and you will surely draw a crowd. But it was still a bit of a surprise that the monthly meeting of the Bay Area R User’s group was so well attended. At one point there were 160 people on the meetup list signed-up to attend the event, and...

5235 sym

Review of ‘R in Action’ by Robert I. Kabacoff

20.12.2011

By Joseph Rickert Yesterday, the cosmic randomizer placed me next to a newly minter lawyer in a crowed Los Gatos coffee shop. In three minutes of conversation I learned that that the fellow was interested in corporate law, was about to take a job that would give him a seat in the great VC/start-up game and that he had some understanding of statis...

3761 sym R (254 sym/1 pcs) 2 img

Coefplot: New Package for Plotting Model Coefficients

03.01.2012

By Joseph Rickert Even to the practiced eye, looking at coefficients in R model summaries can be tedious. And, capturing information about the significance of coefficients from scores or maybe even hundreds of models in a way that makes writing the final report a bit easier is a time consuming and thankless task. Of course, once you know what you...

2398 sym 2 img 1 tbl