Publications by John Myles White

A 3D Version of R’s curve() Function

21.03.2011

I like exploring the behavior of functions of a single variable using the curve() function in R. One thing that seems to be missing from R’s base functions is a tool for exploring functions of two variables. I asked for examples of such a function on Twitter today and didn’t get any answers, so I decided to build my own. As I see it, there ar...

1571 sym R (569 sym/3 pcs) 2 img 1 tbl

A Request for Foursquare Data

25.03.2011

[UPDATE 3/28/2011: Fixed an enormous bug in the R code.] I’m trying to collect data sets that showcase how the classical statistical distributions appear in modern contexts. I’ve already got some data that shows how the gamma distribution appears in video game scores, and now I’m hoping to find an example where the exponential distribution ...

1595 sym R (986 sym/4 pcs) 3 tbl

Problems with ggplot2 0.8.9 and R 2.13.0 on Mac OS X via plyr 1.5

14.04.2011

This morning I tried to completely update my R installation. I first dumped a list of all the packages I have on my system using the installed.packages() function. Then I installed R 2.13.0 using the OS X disk image. And finally I reinstalled all of my packages from scratch. Unfortunately, I ran into some serious problems along the way. After ins...

2259 sym R (466 sym/4 pcs) 2 tbl

Norvig and the Nature of Modern Science

27.05.2011

In this, Chomsky is in complete agreement with O’Reilly. (I recognize that the previous sentence would have an extremely low probability in a probabilistic model trained on a newspaper or TV corpus.)1 Anyone who considers themself an intellectual should be required to read this new essay by Peter Norvig. It’s the best summary I’ve ever see...

1009 sym

Speeding Up MLE Code in R

18.06.2011

Recently, I’ve been fitting some models from the behavioral economics literature to choice data. Most of these models amount to non-linear variants of logistic regression in which I want to infer the parameters of a utility function. Because several of these models aren’t widely used, I’ve had to write my own maximum likelihood code to esti...

3775 sym R (1841 sym/14 pcs) 7 tbl

ProjectTemplate News

25.06.2011

The news below was recently reported on the ProjectTemplate mailing list. For completeness, I’m also reporting it here. The first piece of ProjectTemplate news is that I won’t be the exclusive maintainer for ProjectTemplate anymore. Allen Goodman, who works at BankSimple, is now my co-maintainer and he has full commit privileges. In the next...

3005 sym

Visualizing Periodic Data

28.06.2011

Yesterday the Princeton machine learning reading group went through a paper by Tukey on “Some graphic and semigraphic displays”. One issue we talked about at length was Tukey’s idiosyncratic approach to visualizing periodic data in a circular format to emphasize the connections between the “start” and the “end” of the data set. Alli...

2467 sym

Twitter Math Puzzle and Solution

07.07.2011

Yesterday I posted a very simple math puzzle to Twitter that I found in Jonathan Baron’s book, Thinking and Deciding. The puzzle is the following: Show that every number of the form ABC,ABC is divisible by 13. The puzzle comes up in Baron’s book as an example of an “insight problem” in which one goes from not knowing the answer at all t...

1348 sym

The Psychology of Music and the ‘tuneR’ Package

25.10.2011

Introduction This semester I’m TA’ing a course on the Psychology of Music taught by Phil Johnson-Laird. It’s been a great course to teach because (i) so much of the material is new to me and (ii) because the study of the psychology of music brings together so many of the intellectual tools I enjoy, including music theory, psychophysics and ...

10722 sym R (807 sym/8 pcs) 8 tbl

Using Sparse Matrices in R

31.10.2011

Introduction I’ve recently been working with a couple of large, extremely sparse data sets in R. This has pushed me to spend some time trying to master the CRAN packages that support sparse matrices. This post describes three of them: the Matrix, slam and glmnet packages. The first two packages provide data storage classes for sparse matrices, ...

4447 sym R (3912 sym/18 pcs) 8 img 9 tbl