Publications by Joseph Rickert

MMDS 2010


The 2010 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2010) finished up this past Friday (June 18th) at Stanford. This was an exceptionally well organized conference: four days of mind-stretching talks on algorithm development and the challenges of working with massive data sets approached from almost every conceivable angle. The app...

6369 sym

Why Learn R? It’s the language of Statistics


In the Introduction to his book “R for SAS and SPSS Users” (Springer 2009) Robert Muenchen offers ten reasons for learning R if you already know SAS or SPSS. All ten reasons say something important about R. However, his fourth reason: “R’s language is more powerful than SAS or SPSS. R developers write most of their analytic methods using ...

4922 sym

Making sense of MapReduce


From guest blogger Joseph Rickert. Last night I went to hear Ken Krugler of Bixolabs talk about Hadoop at the monthly meeting of the Software Developers Forum. Maybe because Ken is an unusually lucid speaker, or maybe because I just reached some sort of cumulative tipping point through the prep work of all those patient people who have tried to h...

2319 sym

ACM Data Mining Camp


By guest blogger Joseph Rickert. I was very happy to be a part of the ACM Data Mining camp held last Saturday (November 13th) at eBay. It was a big day for discussing hot topics in data mining, Mahout, parallel SVMs etc, and also a pretty big day for R.  Because Revolution Analytics was a sponsor for the camp, I got to give a three minute compan...

3040 sym 2 img

Predicting R models with PMML: Revolution R Enterprise and ADAPA


The recently announced Revolution Analytics / Zementis partnership goes a long way towards demonstrating how R fits into big-league production environments. A frequent complaint against R is that although R is fine prototyping tool it is not able to handle production environments. Well, that’s just not true. In fact, it is straightforward to bu...

2688 sym R (827 sym/1 pcs)

Baseball, T-tests and statistical surprises


Are MLB players better hitters now than they were 20 years ago? Revolution Analytics' Joseph Rickert uses R to take a look at the data, and offers an instructive lesson in checking your assumptions for statistical tests in the process — Ed. Data are everywhere – but, even for simple things, I still seem to spend a too much time surfing the we...

3640 sym R (2071 sym/3 pcs) 4 img

New functions for linear model inference in Revolution R Enterprise 4.3


The latest release of Revolution R Enterprise shows how Revolution Analytics’ package for big data, RevoScaleR, is continuing add new capabilities for Big Data statistics. RevoScaleR removes the limits on the size of the data that can be processed in R through the use of the highly efficient .Xdf binary file format. Xdf stores data by rows with...

3073 sym R (1860 sym/5 pcs) 1 tbl

K-Means Clustering on Big Data


In this post Joseph Rickert demonstrates how to build a classification model on a large data set with the RevoScaleR package. A script file for use with Revolution R Enterprise to recreate the analysis below is at the end of the post, and can also be downloaded here — ed. The k-means (Lloyd) algorithm, an intuitive way to explore the structure ...

4738 sym R (901 sym/1 pcs) 4 img

A Work of Art: Efron on Bayesian Inference


(Contributing blogger Joseph Rickert reports from the Stanford University Statistics Seminar series – ed.) Stanford University is very gracious about letting the general public attend many university events. Yesterday, it caught my eye that Bradley Efron was going to speak on Bayesian inference and the parametric bootstrap at the weekly Statist...

2712 sym

Where to find data to use with R


(Contributing blogger Joe Rickert has put together a fantastic list of data sources suitable for use with R. If you're looking for data to use in the Applications of R Contest — entries close October 31 — this is a great resource for you — Ed.) Hardly a day goes by without someone or something reminding me that we are drowning in a sea of d...

4714 sym