Publications by John Mount

Thank You For The Very Nice Comment

16.08.2017

Somebody nice reached out and gave us this wonderful feedback on our new Supervised Learning in R: Regression (paid) video course. Thanks for a wonderful course on DataCamp on XGBoost and Random forest. I was struggling with Xgboost earlier and Vtreat has made my life easy now :). Supervised Learning in R: Regression covers a lot as it treats pre...

1484 sym 4 img

Is dplyr Easily Comprehensible?

19.08.2017

dplyr is one of the most popular R packages. It is powerful and important. But is it in fact easily comprehensible?dplyr makes sense to those of us who use it a lot. And we can teach part time R users a lot of the common good use patterns. But, is it an easy task to study and characterize dplyr itself? Please take our advanced dplyr quiz to ...

854 sym 2 img

Some Neat New R Notations

22.08.2017

The R package seplyr supplies a few neat new coding notations. An Abacus, which gives us the term “calculus.” The first notation is an operator called the “named map builder”. This is a cute notation that essentially does the job of stats::setNames(). It allows for code such as the following: library("seplyr") names <- c('a', 'b') n...

2386 sym R (190 sym/4 pcs) 2 img

wrapr: R Code Sweeteners

25.08.2017

wrapr is an R package that supplies powerful tools for writing and debugging R code. Primary wrapr services include: let() %.>% (dot arrow pipe) := (named map builder) λ() (anonymous function builder) DebugFnW() let() let() allows execution of arbitrary code with substituted variable names (note this is subtly different than binding values fo...

3554 sym R (1136 sym/9 pcs) 2 img

Neat New seplyr Feature: String Interpolation

28.08.2017

The R package seplyr has a neat new feature: the function seplyr::expand_expr() which implements what we call “the string algebra” or string expression interpolation. The function takes an expression of mixed terms, including: variables referring to names, quoted strings, and general expression terms. It then “de-quotes” all of the variab...

2863 sym R (2325 sym/16 pcs) 2 img

Why to use the replyr R package

31.08.2017

Recently I noticed that the R package sparklyr had the following odd behavior: suppressPackageStartupMessages(library("dplyr")) library("sparklyr") packageVersion("dplyr") #> [1] '0.7.2.9000' packageVersion("sparklyr") #> [1] '0.6.2' packageVersion("dbplyr") #> [1] '1.1.0.9000' sc <- spark_connect(master = 'local') #> * Using Spark: 2.1.0 d <- d...

3414 sym R (545 sym/2 pcs) 4 img

Permutation Theory In Action

02.09.2017

While working on a large client project using Sparklyr and multinomial regression we recently ran into a problem: Apache Spark chooses the order of multinomial regression outcome targets, whereas R users are used to choosing the order of the targets (please see here for some details). So to make things more like R users expect, we need a way to t...

2340 sym R (682 sym/8 pcs)

It is Needlessly Difficult to Count Rows Using dplyr

03.09.2017

Question: how hard is it to count rows using the R package dplyr? Answer: surprisingly difficult. When trying to count rows using dplyr or dplyr controlled data-structures (remote tbls such as Sparklyr or dbplyr structures) one is sailing between Scylla and Charybdis. The task being to avoid dplyr corner-cases and irregularities (a few of which ...

3600 sym R (1609 sym/31 pcs) 2 img

My advice on dplyr::mutate()

22.09.2017

There are substantial differences between ad-hoc analyses (be they: machine learning research, data science contests, or other demonstrations) and production worthy systems. Roughly: ad-hoc analyses have to be correct only at the moment they are run (and often once they are correct, that is the last time they are run; obviously the idea of repro...

4768 sym R (1164 sym/4 pcs) 2 img

Upcoming data preparation and modeling article series

23.09.2017

I am pleased to announce that vtreat version 0.6.0 is now available to R users on CRAN. vtreat is an excellent way to prepare data for machine learning, statistical inference, and predictive analytic projects. If you are an R user we strongly suggest you incorporate vtreat into your projects. vtreat handles, in a statistically sound fashion: ...

2232 sym 2 img