Publications by John Mount
Don’t use stats::aggregate()
When working with an analysis system (such as R) there are usually good reasons to prefer using functions from the “base” system over using functions from extension packages. However, base functions are sometimes locked into unfortunate design compromises that can now be avoided. In R’s case I would say: do not use stats::aggregate(). Rea...
1844 sym R (1845 sym/4 pcs)
Fast food, fast publication
The following article is getting quite a lot of press right now: David Just and Brian Wansink (2015). Fast Food, Soft Drink, and Candy Intake is Unrelated to Body Mass Index for 95% of American Adults. Obesity Science & Practice, forthcoming (upcoming in a new pay for placement journal). Obviously it is a popular contrary position (some coverage...
3933 sym R (899 sym/1 pcs) 2 img
Free gradient boosting lecture
We have always regretted that we didn’t get to cover gradient boosting in Practical Data Science with R (Manning 2014). To try make up for that we are sharing (for free) our GBM lecture from our (paid) video course Introduction to Data Science. (link, all support material here). Please help us get the word out by sharing/Tweeting! Related ...
739 sym
Wald’s sequential analysis technique
Microsoft Revolution Analytics has just posted our latest article on A/B testing: Wald’s graphical sequential inspection procedure. It is a fun appreciation of a really cool procedure and I hope you check it out. Figure 14, Section 6.4.2, page 111, Abraham Wald, Sequential Analysis, Dover 2004 (reprinting a 1947 edition). Related To leave a...
729 sym 2 img
Sequential Analysis
We here at Win-Vector LLC been working through an ad-hoc series about A/B testing combining elements of both operations research and statistical points of view. A dynamic programming solution to A/B test design Why does designing a simple A/B test seem so complicated? A clear picture of power and significance in A/B tests Bandit Formulations for...
6995 sym 6 img
Practical Data Science with R examples
One of the big points of Practical Data Science with R is to supply a large number of fully worked examples. Our intent has always been for readers to read the book, and if they wanted to follow up on a data set or technique to find the matching worked examples in the project directory of our book support materials git repository. Some readers w...
2173 sym
An R function return and assignment puzzle
Here is an R programming puzzle. What does the following code snippet actually do? And ever harder: what does it mean? (See here for some material on the difference between what code does and what code means.) f <- function() { x <- 5 } f() In R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" the code appears to call the function f()...
3254 sym 3 img
Some programming language theory in R
Let’s take a break from statistics and data science to think a bit about programming language theory, and how the theory relates to the programming language used in the R analysis platform (the language is technically called “S”, but we are going to just call the whole analysis system “R”). Our reasoning is: if you want to work as a mod...
11798 sym R (2408 sym/12 pcs) 2 img
Using Excel versus using R
Here is a video I made showing how R should not be considered “scarier” than Excel to analysts. One of the takeaway points: it is easier to email R procedures than Excel procedures. Win-Vector’s John Mount shows a simple analysis both in Excel and in R. Related To leave a comment for the author, please follow the link and comment on the...
660 sym
Nina Zumel and John Mount part of R Day at Strata + Hadoop World in San Jose 2016
Nina Zumel and I are honored to have been invited to be part of Strata + Hadoop World in San Jose 2016 R Day organized by RStudio and O’Reilly. We have written a lot on the topic of model validation in R and we are very excited to distill it down to an exciting tutorial. We put a lot of time and effort into preparing something like this. Help...
1448 sym