Publications by John Myles White

Back to Blogging

31.03.2012

If you’re subscribed to this blog, you’ve surely noticed the very long hiatus I’ve taken from writing over the last six months. I wish I’d kept up with blogging more faithfully this year, but, in my defense, I’ve been busy doing a few big things: I wrote a book with Drew Conway called Machine Learning for Hackers, which was published l...

2017 sym

Julia, I Love You

31.03.2012

Julia is a new language for scientific computing that is winning praise from a slew of very smart people, including Harlan Harris, Chris Fonnesbeck, Douglas Bates, Vince Buffalo and Shane Conway. As a language, it has lofty design goals, which, if attained, will make it noticeably superior to Matlab, R and Python for scientific programming. In th...

3220 sym R (209 sym/4 pcs) 2 tbl

Simulated Annealing in Julia

04.04.2012

Building Optimization Functions for Julia In hopes of adding enough statistical functionality to Julia to make it usable for my day-to-day modeling projects, I’ve written a very basic implementation of the simulated annealing (SA) algorithm, which I’ve placed in the same JuliaVsR GitHub repository that I used for the code for my previous post...

5821 sym Python (3292 sym/6 pcs) 4 img 3 tbl

Comparing Julia and R’s Vocabularies

09.04.2012

While exploring the Julia manual recently, I realized that it might be helpful to put the basic vocabularies of Julia and R side-by-side for easy comparison. So I took Hadley Wickham’s R Vocabulary section from the book he’s putting together on the devtools wiki, put all of the functions Hadley listed into a CSV file, and proceede...

2692 sym 1 tbl

Floating Point Arithmetic and The Descent into Madness

13.04.2012

While I should confess upfront that I’ve always had a weaker command of the details of floating point arithmetic than I feel I ought to have, this sort of thing still blows my mind when I stumble upon it. These moments invariably make me realize that floating point math will simply never satisfy my naive hopes as a mathematician: 1 2 3 0.1 + 0....

1076 sym R (96 sym/2 pcs) 1 tbl

Implementing the Exact Binomial Test in Julia

14.04.2012

One major benefit of spending my time recently adding statistical functionality to Julia is that I’ve learned a lot about the inner guts of algorithmic null hypothesis significance testing. Implementing Welch’s two-sample t-test last week was a trivial task because of the symmetry of the null hypothesis, but implementing the exact binomial te...

4332 sym R (809 sym/6 pcs) 3 tbl

cumplyr: Extending the plyr Package to Handle Cross-Dependencies

03.05.2012

Introduction For me, Hadley Wickham‘s reshape and plyr packages are invaluable because they encapsulate omnipresent design patterns in statistical computing: reshape handles switching between the different possible representations of the same underlying data, while plyr automates what Hadley calls the Split-Apply-Combine strategy, in which you ...

6638 sym R (1512 sym/2 pcs) 14 tbl

Criticism 1 of NHST: Good Tools for Individual Researchers are not Good Tools for Research Communities

10.05.2012

Introduction Over my years as a graduate student, I have built up a long list of complaints about the use of Null Hypothesis Significance Testing (NHST) in the empirical sciences. In the next few weeks, I’m planning to publish a series of blog posts, each of which will articulate one specific weakness of NHST. The weaknesses I will discuss are ...

9741 sym

Criticism 2 of NHST: NHST Conflates Rare Events with Evidence Against the Null Hypothesis

12.05.2012

Introduction This is my second post in a series describing the weaknesses of the NHST paradigm. In the first post, I argued that NHST is a dangerous tool for a community of researchers because p-values cannot be interpreted properly without perfect knowledge of the research practices of other scientists — knowledge that we cannot hope to attain...

9216 sym

Criticism 3 of NHST: Essential Information is Lost When Transforming 2D Data into a 1D Measure

14.05.2012

Introduction Continuing on with my series on the weaknesses of NHST, I’d like to focus on an issue that’s not specific to NHST, but rather one that’s relevant to all quantitative analysis: the destruction caused by an inappropriate reduction of dimensionality. In our case, we’ll be concerned with the loss of essential information caused b...

4612 sym 2 img