Publications by John Mount

Don’t use stats::aggregate()

31.10.2015

When working with an analysis system (such as R) there are usually good reasons to prefer using functions from the “base” system over using functions from extension packages. However, base functions are sometimes locked into unfortunate design compromises that can now be avoided. In R’s case I would say: do not use stats::aggregate(). Rea...

1844 sym R (1845 sym/4 pcs)

Fast food, fast publication

08.11.2015

The following article is getting quite a lot of press right now: David Just and Brian Wansink (2015). Fast Food, Soft Drink, and Candy Intake is Unrelated to Body Mass Index for 95% of American Adults. Obesity Science & Practice, forthcoming (upcoming in a new pay for placement journal). Obviously it is a popular contrary position (some coverage...

3933 sym R (899 sym/1 pcs) 2 img

Free gradient boosting lecture

21.11.2015

We have always regretted that we didn’t get to cover gradient boosting in Practical Data Science with R (Manning 2014). To try make up for that we are sharing (for free) our GBM lecture from our (paid) video course Introduction to Data Science. (link, all support material here). Please help us get the word out by sharing/Tweeting! Related ...

739 sym

Wald’s sequential analysis technique

10.12.2015

Microsoft Revolution Analytics has just posted our latest article on A/B testing: Wald’s graphical sequential inspection procedure. It is a fun appreciation of a really cool procedure and I hope you check it out. Figure 14, Section 6.4.2, page 111, Abraham Wald, Sequential Analysis, Dover 2004 (reprinting a 1947 edition). Related To leave a...

729 sym 2 img

Sequential Analysis

11.12.2015

We here at Win-Vector LLC been working through an ad-hoc series about A/B testing combining elements of both operations research and statistical points of view. A dynamic programming solution to A/B test design Why does designing a simple A/B test seem so complicated? A clear picture of power and significance in A/B tests Bandit Formulations for...

6995 sym 6 img

Practical Data Science with R examples

11.12.2015

One of the big points of Practical Data Science with R is to supply a large number of fully worked examples. Our intent has always been for readers to read the book, and if they wanted to follow up on a data set or technique to find the matching worked examples in the project directory of our book support materials git repository. Some readers w...

2173 sym

An R function return and assignment puzzle

29.12.2015

Here is an R programming puzzle. What does the following code snippet actually do? And ever harder: what does it mean? (See here for some material on the difference between what code does and what code means.) f <- function() { x <- 5 } f() In R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" the code appears to call the function f()...

3254 sym 3 img

Some programming language theory in R

01.01.2016

Let’s take a break from statistics and data science to think a bit about programming language theory, and how the theory relates to the programming language used in the R analysis platform (the language is technically called “S”, but we are going to just call the whole analysis system “R”). Our reasoning is: if you want to work as a mod...

11798 sym R (2408 sym/12 pcs) 2 img

Using Excel versus using R

15.01.2016

Here is a video I made showing how R should not be considered “scarier” than Excel to analysts. One of the takeaway points: it is easier to email R procedures than Excel procedures. Win-Vector’s John Mount shows a simple analysis both in Excel and in R. Related To leave a comment for the author, please follow the link and comment on the...

660 sym

Nina Zumel and John Mount part of R Day at Strata + Hadoop World in San Jose 2016

17.01.2016

Nina Zumel and I are honored to have been invited to be part of Strata + Hadoop World in San Jose 2016 R Day organized by RStudio and O’Reilly. We have written a lot on the topic of model validation in R and we are very excited to distill it down to an exciting tutorial. We put a lot of time and effort into preparing something like this. Help...

1448 sym