Publications by John Mount

Announcing the wrapr packge for R

11.02.2017

Recently Dirk Eddelbuettel pointed out that our R function debugging wrappers would be more convenient if they were available in a low-dependency micro package dedicated to little else. Dirk is a very smart person, and like most R users we are deeply in his debt; so we (Nina Zumel and myself) listened and immediately moved the wrappers into a ne...

1875 sym 2 img

The Zero Bug

21.02.2017

I am going to write about an insidious statistical, data analysis, and presentation fallacy I call “the zero bug” and the habits you need to cultivate to avoid it. The zero bug Here is the zero bug in a nutshell: common data aggregation tools often can not “count to zero” from examples, and this causes problems. Please read on for what...

7924 sym R (2582 sym/8 pcs) 12 img

Iteration and closures in R

26.02.2017

I recently read an interesting thread on unexpected behavior in R when creating a list of functions in a loop or iteration. The issue is solved, but I am going to take the liberty to try and re-state and slow down the discussion of the problem (and fix) for clarity. The issue is: are references or values captured during iteration? Many users exp...

5128 sym R (788 sym/6 pcs) 2 img

wrapr: for sweet R code

01.03.2017

This article is on writing sweet R code using the wrapr package. The problem Consider the following R puzzle. You are given: a data.frame, the name of a column that you wish to find missing values (NA) in, and the name of a column to land the result. For instance: d <- data.frame(x = c(1, NA)) print(d) # x # 1 1 # 2 NA cname <- '...

1823 sym R (659 sym/6 pcs) 2 img

vtreat: prepare data

03.03.2017

This article is on preparing data for modeling in R using vtreat. Our example Suppose we wish to work with some data. Our example task is to train a classification model for credit approval using the ranger implementation of the random forests method. We will take our data from John Ross Quinlan's re-processed “credit approval” dataset hoste...

6279 sym R (2408 sym/8 pcs) 6 img

replyr: Get a Grip on Big Data in R

05.03.2017

replyr is an R package that contains extensions, adaptions, and work-arounds to make remote R dplyr data sources (including big data systems such as Spark) behave more like local data. This allows the analyst to more easily develop and debug procedures that simultaneously work on a variety of data services (in-memory data.frame, SQLite, PostgreSQ...

2708 sym R (7503 sym/16 pcs) 2 img

Step-Debugging magrittr/dplyr Pipelines in R with wrapr and replyr

06.03.2017

In this screencast we demonstrate how to easily and effectively step-debug magrittr/dplyr pipelines in R using wrapr and replyr. Some of the big issues in trying to debug magrittr/dplyr pipelines include: Pipelines being large expressions that are hard to line-step into. Visibility of intermediate results. Localizing operations (in time and co...

972 sym

sigr: Simple Significance Reporting

07.03.2017

sigr is a simple R package that conveniently formats a few statistics and their significance tests. This allows the analyst to use the correct test no matter what modeling package or procedure they use. Model Example Let’s take as our example the following linear relation between x and y: library('sigr') set.seed(353525) d <- data.frame(x= rn...

3558 sym R (3904 sym/32 pcs) 4 img

Some Win-Vector R packages

09.03.2017

This post concludes our mini-series of Win-Vector open source R packages. We end with WVPlots, a collection of ready-made ggplot2 plots we find handy. Please read on for list of some of the Win-Vector LLC open-source R packages that we are pleased to share. For each package we have prepared a short introduction, so you can quickly check if a ...

805 sym 12 img

Practical Data Science with R errata update: Java SQLScrewdriver replaced by R procedures and article

11.03.2017

We have updated the errata for Practical Data Science with R to reflect that it is no longer worth the effort to use the Java version of SQLScrewdriver as described. We are very sorry for any confusion, trouble, or wasted effort bringing in Java software (something we are very familiar with, but forget not everybody uses) has caused readers. Al...

1733 sym 2 img