Publications by John Myles White
Playing with The Circular Law in Julia
Introduction Statistically-trained readers of this blog will be very familiar with the Central Limit Theorem, which describes the asymptotic sampling distribution of the mean of a random vector composed of IID variables. Some of the most interesting recent work in mathematics has been focused on the development of increasingly powerful proofs of ...
2459 sym R (355 sym/2 pcs) 2 img 1 tbl
EDA Before CDA
One Paragraph Summary Always explore your data visually. Whatever specific hypothesis you have when you go out to collect data is likely to be worse than any of the hypotheses you’ll form after looking at just a few simple visualizations of that data. The most effective hypothesis testing framework in existence is the test of intraocular trauma...
2154 sym R (760 sym/2 pcs) 10 img 1 tbl
Overfitting
What do you think when you see a model like the one below? Does this strike you as a good model? Or as a bad model? There’s no right or wrong answer to this question, but I’d like to argue that models that are able to match white noise are typically bad things, especially when you don’t have a clear cross-validation paradigm that will allo...
2082 sym 2 img
The Shape of Floating Point Random Numbers
[Updated 10/18/2012: Fixed a typo in which mantissa was replaced with exponent.] Over the weekend, Viral Shah updated Julia’s implementation of randn() to give a 20% speed boost. Because we all wanted to test that this speed-up had not come at the expense of the validity of Julia’s RNG system, I spent some time this weekend trying to get test...
5400 sym 10 img
The State of Statistics in Julia
Updated 12.2.2012: Added sample output based on a suggestion from Stefan Karpinski. Introduction Over the last few weeks, the Julia core team has rolled out a demo version of Julia’s package management system. While the Julia package system is still very much in beta, it nevertheless provides the first plausible way for non-expert users to see ...
4388 sym R (14484 sym/18 pcs) 9 tbl
A Cheap Criticism of p-Values
One of these days I am going to finish my series on problems with how NHST is issued in the social sciences. Until then, I came up with a cheap criticism of p-values today. To make sense of my complaint, you’ll want to head over to Andy Gelman’s blog and read the comments on his recent blog post about p-values. Reading them makes one thing cl...
986 sym
What is Correctness for Statistical Software?
Introduction A few months ago, Drew Conway and I gave a webcast that tried to teach people about the basic principles behind linear and logistic regression. To illustrate logistic regression, we worked through a series of progressively more complex spam detection problems. The simplest data set we used was the following: This data set has one cl...
5579 sym Python (359 sym/4 pcs) 4 img 2 tbl
Computers are Machines
When people try out Julia for the first time, many of them are worried by the following example: 1 2 3 4 5 6 7 julia> factorial(n) = n == 0 ? 1 : n * factorial(n - 1) julia> factorial(20) 2432902008176640000 julia> factorial(21) -4249290049419214848 If you’re not familiar with computer architecture, this result is very troubling. Why would Ju...
4333 sym R (1258 sym/16 pcs) 8 tbl
Symbolic Differentiation in Julia
A Brief Introduction to Metaprogramming in Julia In contrast to my previous post, which described one way in which Julia allows (and expects) the programmer to write code that directly employs the atomic operations offered by computers, this post is meant to introduce newcomers to some of Julia’s higher level functions for metaprogramming. To m...
5483 sym R (3619 sym/26 pcs) 13 tbl
Americans Live Longer and Work Less
Today I saw an article on Hacker News entitled, “America’s CEOs Want You to Work Until You’re 70″. I was particularly surprised by this article appearing out of the blue because I take it for granted that America will eventually have to raise the retirement age to avoid bankruptcy. After reading the article, I wasn’t able to figure out ...
1421 sym 2 img