Publications by John Mount
Thank You For The Very Nice Comment
Somebody nice reached out and gave us this wonderful feedback on our new Supervised Learning in R: Regression (paid) video course. Thanks for a wonderful course on DataCamp on XGBoost and Random forest. I was struggling with Xgboost earlier and Vtreat has made my life easy now :). Supervised Learning in R: Regression covers a lot as it treats pre...
1484 sym 4 img
Is dplyr Easily Comprehensible?
dplyr is one of the most popular R packages. It is powerful and important. But is it in fact easily comprehensible?dplyr makes sense to those of us who use it a lot. And we can teach part time R users a lot of the common good use patterns. But, is it an easy task to study and characterize dplyr itself? Please take our advanced dplyr quiz to ...
854 sym 2 img
Some Neat New R Notations
The R package seplyr supplies a few neat new coding notations. An Abacus, which gives us the term “calculus.” The first notation is an operator called the “named map builder”. This is a cute notation that essentially does the job of stats::setNames(). It allows for code such as the following: library("seplyr") names <- c('a', 'b') n...
2386 sym R (190 sym/4 pcs) 2 img
wrapr: R Code Sweeteners
wrapr is an R package that supplies powerful tools for writing and debugging R code. Primary wrapr services include: let() %.>% (dot arrow pipe) := (named map builder) λ() (anonymous function builder) DebugFnW() let() let() allows execution of arbitrary code with substituted variable names (note this is subtly different than binding values fo...
3554 sym R (1136 sym/9 pcs) 2 img
Neat New seplyr Feature: String Interpolation
The R package seplyr has a neat new feature: the function seplyr::expand_expr() which implements what we call “the string algebra” or string expression interpolation. The function takes an expression of mixed terms, including: variables referring to names, quoted strings, and general expression terms. It then “de-quotes” all of the variab...
2863 sym R (2325 sym/16 pcs) 2 img
Why to use the replyr R package
Recently I noticed that the R package sparklyr had the following odd behavior: suppressPackageStartupMessages(library("dplyr")) library("sparklyr") packageVersion("dplyr") #> [1] '0.7.2.9000' packageVersion("sparklyr") #> [1] '0.6.2' packageVersion("dbplyr") #> [1] '1.1.0.9000' sc <- spark_connect(master = 'local') #> * Using Spark: 2.1.0 d <- d...
3414 sym R (545 sym/2 pcs) 4 img
Permutation Theory In Action
While working on a large client project using Sparklyr and multinomial regression we recently ran into a problem: Apache Spark chooses the order of multinomial regression outcome targets, whereas R users are used to choosing the order of the targets (please see here for some details). So to make things more like R users expect, we need a way to t...
2340 sym R (682 sym/8 pcs)
It is Needlessly Difficult to Count Rows Using dplyr
Question: how hard is it to count rows using the R package dplyr? Answer: surprisingly difficult. When trying to count rows using dplyr or dplyr controlled data-structures (remote tbls such as Sparklyr or dbplyr structures) one is sailing between Scylla and Charybdis. The task being to avoid dplyr corner-cases and irregularities (a few of which ...
3600 sym R (1609 sym/31 pcs) 2 img
My advice on dplyr::mutate()
There are substantial differences between ad-hoc analyses (be they: machine learning research, data science contests, or other demonstrations) and production worthy systems. Roughly: ad-hoc analyses have to be correct only at the moment they are run (and often once they are correct, that is the last time they are run; obviously the idea of repro...
4768 sym R (1164 sym/4 pcs) 2 img
Upcoming data preparation and modeling article series
I am pleased to announce that vtreat version 0.6.0 is now available to R users on CRAN. vtreat is an excellent way to prepare data for machine learning, statistical inference, and predictive analytic projects. If you are an R user we strongly suggest you incorporate vtreat into your projects. vtreat handles, in a statistically sound fashion: ...
2232 sym 2 img