Publications by John Mount
An R function return and assignment puzzle
Here is an R programming puzzle. What does the following code snippet actually do? And ever harder: what does it mean? (See here for some material on the difference between what code does and what code means.) f <- function() { x <- 5 } f() In R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" the code appears to call the function f()...
3254 sym 3 img
Some programming language theory in R
Let’s take a break from statistics and data science to think a bit about programming language theory, and how the theory relates to the programming language used in the R analysis platform (the language is technically called “S”, but we are going to just call the whole analysis system “R”). Our reasoning is: if you want to work as a mod...
11798 sym R (2408 sym/12 pcs) 2 img
Using Excel versus using R
Here is a video I made showing how R should not be considered “scarier” than Excel to analysts. One of the takeaway points: it is easier to email R procedures than Excel procedures. Win-Vector’s John Mount shows a simple analysis both in Excel and in R. Related To leave a comment for the author, please follow the link and comment on the...
660 sym
Nina Zumel and John Mount part of R Day at Strata + Hadoop World in San Jose 2016
Nina Zumel and I are honored to have been invited to be part of Strata + Hadoop World in San Jose 2016 R Day organized by RStudio and O’Reilly. We have written a lot on the topic of model validation in R and we are very excited to distill it down to an exciting tutorial. We put a lot of time and effort into preparing something like this. Help...
1448 sym
Prepping Data for Analysis using R
Nina and I are proud to share our lecture: “Prepping Data for Analysis using R” from ODSC West 2015. Nina Zumel and John Mount ODSC WEST 2015 It is about 90 minutes, and covers a lot of the theory behind the vtreat data preparation library. We also have a Github repository including all the lecture materials here. Nina’s preview still (s...
1038 sym 2 img
Win-Vector data science mailing list (and a give-away!)
Win-Vector LLC is starting a data science mailing list that we would like you to sign up for. It is going to be a (deliberately infrequent) set of updates including Win-Vector LLC notices, upcoming speaking events, and data science products. To kick this off we will be awarding 5 free permanent subscriptions to our video course “Introduction t...
1104 sym
Running R jobs quickly on many machines
As we demonstrated in “A gentle introduction to parallel computing in R” one of the great things about R is how easy it is to take advantage of parallel processing capabilities to speed up calculation. In this note we will show how to move from running jobs multiple CPUs/cores to running jobs multiple machines (for even larger scaling and gr...
6132 sym 2 img
Shiny Developer Conference
Really enjoying RStudio‘s Shiny Developer Conference | Stanford University | January 2016. Winston Chang just demonstrated profvis, really slick. You can profile code just by wrapping it in a profvis({}) block and the results are exported as interactive HTML widgets. For example, running the R code below: if(!('profvis' %in% rownames(install...
1257 sym 2 img
Free video course: applied Bayesian A/B testing in R
As a “thank you” to our blog, mailing list, and Twitter followers (@WinVectorLLC) we at Win-Vector LLC have decided to re-release our formerly fee-based A/B testing video course as a free (advertisement supported) video course here on Youtube. The course emphasizes how to design A/B tests using prior “guestimates” of effect sizes (often ...
1981 sym 2 img
Databases in containers
A great number of readers reacted very positively to Nina Zumel‘s article Using PostgreSQL in R: A quick how-to. Part of the reason is she described an incredibly powerful data science pattern: using a formerly expensive permanent system infrastructure as a simple transient tool. In her case the tools were the data manipulation grammars SQL (S...
5924 sym 2 img