Publications by John Myles White

Criticism 4 of NHST: No Mechanism for Producing Substantive Cumulative Knowledge

18.05.2012

[Note to the Reader: This is a much rougher piece than the previous pieces because the argument is more complex. I ask that you please point out places where things are unclear and where claims are not rigorous.] In this fourth part of my series of criticisms of NHST, I’m going to focus on broad questions of epistemology: I want to ask what typ...

16393 sym R (68 sym/1 pcs)

The Great Julia RNG Refactor

21.06.2012

Many readers of this blog will know that I’m a big fan of Bayesian methods, in large part because automated inference tools like JAGS allow modelers to focus on the types of structure they want to extract from data rather than worry about the algorithmic details of how they will fit their models to data. For me, the ease with which we can const...

4780 sym Python (1098 sym/8 pcs) 4 tbl

Bayesian Nonparametrics in R

25.06.2012

On July 25th, I’ll be presenting at the Seattle R Meetup about implementing Bayesian nonparametrics in R. If you’re not sure what Bayesian nonparametric methods are, they’re a family of methods that allow you to fit traditional statistical models, such as mixture models or latent factor models, without having to fully specify the number of ...

2943 sym 14 img

Optimization Functions in Julia

09.07.2012

Over the last few weeks, I’ve made a concerted effort to develop a basic suite of optimization algorithms for Julia so that Matlab programmers used to using fminunc() and R programmers used to using optim() can start to transition code over to Julia that requires access to simple optimization algorithms like L-BFGS and the Nelder-Mead method. A...

3538 sym R (1196 sym/2 pcs) 4 img 1 tbl

Criticism 5 of NHST: p-Values Measure Effort, Not Truth

17.07.2012

Introduction In the third installment of my series of criticisms of NHST, I focused on the notion that a p-value is nothing more than a one-dimensional representation of a two-dimensional space in which (1) the measured size of an effect and (2) the precision of this measurement have been combined in such a way that we can never pull those two di...

6917 sym 2 img

Automatic Hyperparameter Tuning Methods

20.07.2012

At MSR this week, we had two very good talks on algorithmic methods for tuning the hyperparameters of machine learning models. Selecting appropriate settings for hyperparameters is a constant problem in machine learning, which is somewhat surprising given how much expertise the machine learning community has in optimization theory. I suspect ther...

5716 sym

My New Book: Developing, Deploying and Debugging Multi-Armed Bandit Algorithms

28.07.2012

I’m happy to announce that I’ve started writing a new book for O’Reilly, which will focus on teaching readers how to use Multi-Armed Bandit Algorithms to build better websites. My hope is that the book can help web developers build up an intuition for the core conundrum facing anyone who wants to build a successful business: you have to con...

3766 sym

The Social Dynamics of the R Core Team

12.08.2012

Recently a few members of R Core have indicated that part of what slows down the development of R as a language is that it has become increasingly difficult over the years to achieve consensus among the core developers of the language. Inspired by these claims, I decided to look into this issue quantitatively by measuring the quantity of commits ...

2258 sym 4 img 1 tbl

DataGotham

21.08.2012

As some of you may know already, I’m co-organizing an upcoming conference called DataGotham that’s taking place in September. To help spread the word about DataGotham, I’m cross-posting the most recent announcement below: We’d like to let you know about DataGotham: a celebration of New York City’s data community! http://datagotham.com ...

1591 sym

Will Data Scientists Be Replaced by Tools?

28.08.2012

The Quick-and-Dirty Summary I was recently asked to participate in a proposed SXSW panel that will debate the question, “Will Data Scientists Be Replaced by Tools?” This post describes my current thinking on that question as a way of (1) convincing you to go vote for the panel’s inclusion in this year’s SXSW and (2) instigating a larger d...

5076 sym