Publications by Joseph Rickert

Coefplot: New Package for Plotting Model Coefficients

03.01.2012

By Joseph Rickert Even to the practiced eye, looking at coefficients in R model summaries can be tedious. And, capturing information about the significance of coefficients from scores or maybe even hundreds of models in a way that makes writing the final report a bit easier is a time consuming and thankless task. Of course, once you know what you...

2398 sym 2 img 1 tbl

Simple tools for building a recommendation engine

19.04.2012

By Joseph Rickert Revolution’s resident economist, Saar Golde, is very fond of saying that “90% of what you might from a recommendation engine can be achieved with simple techniques”. To illustrate this point (without doing a lot of work), we downloaded the million row movie dataset from www.grouplens.org with the idea of just taking the fi...

5076 sym 3 tbl

Simulating the Birthday Problem with data derived probabilities

06.06.2012

You've probably heard of the Birthday Paradox: it only takes a small gathering of people before it's quite likely that two of them share the same birthday. You can solve the problem analytically or with simulation, but usually in either case simplifying assumptions are made (no-one born on February 29, for example). Joe Rickert uses Revolution R ...

6667 sym R (489 sym/1 pcs) 8 img

Benchmarking bigglm

13.11.2012

By Joseph Rickert In a recent blog post, David Smith reported on a talk that Steve Yun and I gave at STRATA in NYC about building and benchmarking Poisson GLM models on various platforms. The results presented showed that the rxGlm function from Revolution Analytics’ RevoScaleR package running on a five node cluster outperformed a Map Reduce/ H...

5113 sym R (1133 sym/3 pcs)

A Review of the R Graphics Cookbook

11.02.2013

A common criticism of R, especially from data scientists who are new to R but proficient in multiple programming languages, is that R is “quirky” and annoying because there is almost always more than one way to do simple things.  I usually counter that they are trying to say that R is “flexible” and “rich”, but by the time we get aro...

6330 sym 4 img

Data Science Education gets personal

14.03.2013

by Joseph B. Rickert It is difficult to imagine that there is anyone on the planet with an internet connection and a desire to learn something new who has not at least looked into taking a massive open online course (MOOC). Last Fall, in an 11/4/12 article, the New York Time declared the Year of the MOOC and quoted one of Coursera’s founders, A...

4435 sym

R’s Garden of Probability Distributions

21.03.2013

by Joseph Rickert If you type ?Distributions at the R console you get a list of the 21 probability distributions included in the stats package that ships with base R. The same list appears in the Introduction to R Manual on CRAN and in most of the many fine introductory books available for the R language. These are indeed fundamental distribution...

5127 sym R (366 sym/1 pcs) 4 img

Lots of data != "Big Data"

28.03.2013

by Joseph Rickert When talking with data scientists and analysts — who are working with large scale data analytics platforms such as Hadoop — about the best way to do some sophisticated modeling task it is not uncommon for someone to say, “We have all of the data. Why not just use it all?” This sort of comment often initially sounds pragm...

5731 sym R (168 sym/1 pcs) 4 img

R User Groups Continue to Grow

01.04.2013

by Joseph Rickert R user groups seem to be sprouting all over. Since last September we have noticed ten new groups worldwide: Auckland, New Zealand: Auckland-R-Users-Group (AKLRUG) had 33 people attend their March 8th meeting Chang Mai Thailand: Chang Mai is the first R user group in Thailand Durban, South Africa: The Durban R User Group is look...

1556 sym 2 img

An Introduction to SAS for R Programmers

04.04.2013

by Joseph Rickert Life decisions are usually much too complicated to be attributed to any single cause, but one important reason that I am here at Revolution today is that I ignored suggestions from well-meaning faculty back in graduate school to work more in SAS rather than doing everything in R. There was a heavy emphasis on SAS then: the facul...

6502 sym