Publications by John Myles White

Writing Better Statistical Programs in R

24.01.2013

A while back a friend asked me for advice about speeding up some R code that they’d written. Because they were running an extensive Monte Carlo simulation of a model they’d been developing, the poor performance of their code had become an impediment to their work. After I looked through their code, it was clear that the performance hurdles th...

6649 sym R (1835 sym/8 pcs) 4 img 4 tbl

Modes, Medians and Means: A Unifying Perspective

22.03.2013

Introduction / Warning Any traditional introductory statistics course will teach students the definitions of modes, medians and means. But, because introductory courses can’t assume that students have much mathematical maturity, the close relationship between these three summary statistics can’t be made clear. This post tries to remedy that s...

8566 sym 2 img

Using Norms to Understand Linear Regression

22.03.2013

Introduction In my last post, I described how we can derive modes, medians and means as three natural solutions to the problem of summarizing a list of numbers, \((x_1, x_2, \ldots, x_n)\), using a single number, \(s\). In particular, we measured the quality of different potential summaries in three different ways, which led us to modes, medians ...

8631 sym

What’s Next

09.05.2013

The last two weeks have been full of changes for me. For those who’ve been asking about what’s next, I thought I’d write up a quick summary of all the news. (1) I successfully defended my thesis this past Monday. Completing a Ph.D. has been a massive undertaking for the past five years, and it’s a major relief to be done. From now on I’...

1433 sym

Hopfield Networks in Julia

28.07.2013

As a fun side project last night, I decided to implement a basic package for working with Hopfield networks in Julia. Since I suspect many of the readers of this blog have never seen a Hopfield net before, let me explain what they are and what they can be used for. The short-and-skinny is that Hopfield networks were invented in the 1980′s to de...

3378 sym R (315 sym/2 pcs) 2 img 1 tbl

September Talks

05.09.2013

To celebrate my last full month on the East Coast, I’m doing a bunch of talks. If you’re interested in hearing more about Julia or statistics in general, you might want to come out to one of the events I’ll be at: Julia Tutorial at DataGotham: On 9/12, Stefan and I will be giving a 3-hour long, hands on Julia tutorial as part of the Thursd...

1350 sym

Writing Type-Stable Code in Julia

06.12.2013

For many of the people I talk to, Julia’s main appeal is speed. But achieving peak performance in Julia requires that programmers absorb a few subtle concepts that are generally unfamiliar to users of weakly typed languages. One particularly subtle performance pitfall is the need to write type-stable code. Code is said to be type-stable if the ...

3443 sym R (4070 sym/5 pcs)

The Relationship between Vectorized and Devectorized Code

22.12.2013

Introduction Some people have come to believe that Julia’s vectorized code is unusably slow. To correct this misconception, I outline a naive benchmark below that suggests that Julia’s vectorized code is, in fact, noticeably faster than R’s vectorized code. When experienced Julia programmers suggest that newcomers should consider devectoriz...

7884 sym R (2950 sym/7 pcs) 2 tbl

Data corruption in R 3.0.2 when using read.csv

29.01.2014

Introduction It may be old news to some, but I just recently discovered that the automatic type inference system that R uses when parsing CSV files assumes that data sets will never contain 64-bit integer values. Specially, if an integer value read from a CSV file is too large to fit in a 32-bit integer field without overflow, the column of data ...

3367 sym R (331 sym/4 pcs) 4 tbl