Publications by John Mount

New vtreat Feature: Nested Model Bias Warning

11.01.2020

For quite a while we have been teaching estimating variable re-encodings on the exact same data they are later naively using to train a model on, leads to an undesirable nested model bias. The vtreat package (both the R version and Python version) both incorporate a cross-frame method that allows one to use all the training data both to build lea...

2330 sym R (919 sym/5 pcs)

sklearn Pipe Step Interface for vtreat

14.01.2020

We’ve been experimenting with this for a while, and the next R vtreat package will have a back-port of the Python vtreat package sklearn pipe step interface (in addition to the standard R interface). This means the user can express easily express modeling intent by choosing between coder$fit_transform(train_data), coder$fit(train_data_cal)$tra...

2386 sym

unpack Your Values in R

20.01.2020

I would like to introduce an exciting feature in the upcoming 1.9.6 version of the wrapr R package: value unpacking. The unpacking notation is made available if you install wrapr version 1.9.6 from Github: remotes::install_github("WinVector/wrapr") We will likely send this version to CRAN in a couple of weeks. Here is an example of the unpack ...

5694 sym

Using unpack to Manage Your R Environment

21.01.2020

In our last note we stated that unpack is a good tool for load R RDS files into your working environment. Here is the idea expanded into a worked example. # remotes::install_github("WinVector/wrapr") library(wrapr) a <- 5 b <- 7 do_not_want <- 13 # save the elements of our workspace we want saveRDS(as_named_list(a, b), 'example_data.RDS') # ...

1751 sym

Why we wrote wrapr to/unpack

22.01.2020

One reason we are developing the wrapr to/unpack methods is the following: we wanted to spruce up the R vtreat interface a bit. We had recently back-ported a Python sklearn Pipeline step style interface from the Python vtreat to R (announcement here). But that doesn’t mean we are not continuing to make enhancements to the R style interfaces, u...

2552 sym

wrapr 1.9.6 is now up on CRAN

26.01.2020

wrapr 1.9.6 is now up on CRAN. We unfortunately usually forget to say this. A big thank you to the staff and volunteers at CRAN. As part of this release Nina Zumel has streamlined the unpack vignette, picking and recommending specific notations for the unpack method. We are looking forward to using the new wrapr as_named_list/unpack pair to man...

942 sym 2 img

Data re-Shaping in R and in Python

28.01.2020

Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial. This reflects our opinion on the “which is better for data science R or Python?” They both are great. So start with one, and exp...

1501 sym

R Tip: Check What Repos You are Using

02.02.2020

In a lot of our R writing we casually say “install from CRAN using install.packages('PKGNAME')” or “update your packages by using update.packages(ask = FALSE, checkBuilt = TRUE) (and answering ‘no’ to all questions about compiling).” We recently became aware that for some users this isn’t complete advice. The above depends on your ...

3425 sym

wrapr Update: Removing Some Under-Used Functions and Classes

04.02.2020

For the next version of the R package wrapr we are going to be removing a number of under-used functions/methods and classes. This update will likely happen in March 2020, and is the start of the wrapr 2.* series. Most of the items being removed are different abstractions for helping with function composition. We ended up moving most of our work...

1237 sym

New Data Scientist Stickers

05.02.2020

We have a new data scientist sticker! If you see Nina or John at a conference/MeetUp, please ask us for a sticker! Related To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Cl...

517 sym 2 img