Publications by John Mount

Prepping Data for Analysis using R

20.01.2016

Nina and I are proud to share our lecture: “Prepping Data for Analysis using R” from ODSC West 2015. Nina Zumel and John Mount ODSC WEST 2015 It is about 90 minutes, and covers a lot of the theory behind the vtreat data preparation library. We also have a Github repository including all the lecture materials here. Nina’s preview still (s...

1038 sym 2 img

Win-Vector data science mailing list (and a give-away!)

20.01.2016

Win-Vector LLC is starting a data science mailing list that we would like you to sign up for. It is going to be a (deliberately infrequent) set of updates including Win-Vector LLC notices, upcoming speaking events, and data science products. To kick this off we will be awarding 5 free permanent subscriptions to our video course “Introduction t...

1104 sym

Running R jobs quickly on many machines

22.01.2016

As we demonstrated in “A gentle introduction to parallel computing in R” one of the great things about R is how easy it is to take advantage of parallel processing capabilities to speed up calculation. In this note we will show how to move from running jobs multiple CPUs/cores to running jobs multiple machines (for even larger scaling and gr...

6132 sym 2 img

Shiny Developer Conference

31.01.2016

Really enjoying RStudio‘s Shiny Developer Conference | Stanford University | January 2016. Winston Chang just demonstrated profvis, really slick. You can profile code just by wrapping it in a profvis({}) block and the results are exported as interactive HTML widgets. For example, running the R code below: if(!('profvis' %in% rownames(install...

1257 sym 2 img

Free video course: applied Bayesian A/B testing in R

04.02.2016

As a “thank you” to our blog, mailing list, and Twitter followers (@WinVectorLLC) we at Win-Vector LLC have decided to re-release our formerly fee-based A/B testing video course as a free (advertisement supported) video course here on Youtube. The course emphasizes how to design A/B tests using prior “guestimates” of effect sizes (often ...

1981 sym 2 img

Databases in containers

08.02.2016

A great number of readers reacted very positively to Nina Zumel‘s article Using PostgreSQL in R: A quick how-to. Part of the reason is she described an incredibly powerful data science pattern: using a formerly expensive permanent system infrastructure as a simple transient tool. In her case the tools were the data manipulation grammars SQL (S...

5924 sym 2 img

More Shiny user showcase demonstrations

24.02.2016

We at Win-Vector LLC are very proud to announce that RStudio just inducted two more of our demonstration Shiny applications into their Shiny User Showcase gallery. Checkout the gallery to see our demonstrations of: Finding the k in k-means A/B test interactive design and analysis tool The geometry of classifiers RStudio (the authors of Shiny)...

1107 sym 6 img

Win-Vector video courses: price/status changes

02.03.2016

Win-Vector LLC has been offering a couple of online video courses on the topics of data science and A/B testing (both using R). These are high quality courses and well worth the money and time needed to work through them closely (with all materials distributed on GitHub). Our current distributor is Udemy, which has just announced a unilateral ch...

3232 sym

Bend or break: strings in R

10.03.2016

A common complaint from new users of R is: the string processing notation is ugly. Using paste(,,sep='') to concatenate strings seems clumsy. You are never sure which regular expression dialect grep()/gsub() are really using. Remembering the difference between length() and nchar() is initially difficult. As always things can be improved by us...

4409 sym 2 img

More on preparing data

18.03.2016

The Microsoft Data Science User Group just sponsored Nina Zumel‘s presentation “Preparing Data for Analysis Using R”. Microsoft saw Win-Vector LLC‘s ODSC West 2015 presentation “Prepping Data for Analysis using R” and generously offered to sponsor improving it and disseminating it to a wider audience. We feel Nina really hit the bal...

2058 sym 2 img