Publications by John Mount
Why we wrote wrapr to/unpack
One reason we are developing the wrapr to/unpack methods is the following: we wanted to spruce up the R vtreat interface a bit. We had recently back-ported a Python sklearn Pipeline step style interface from the Python vtreat to R (announcement here). But that doesn’t mean we are not continuing to make enhancements to the R style interfaces, u...
2552 sym
wrapr 1.9.6 is now up on CRAN
wrapr 1.9.6 is now up on CRAN. We unfortunately usually forget to say this. A big thank you to the staff and volunteers at CRAN. As part of this release Nina Zumel has streamlined the unpack vignette, picking and recommending specific notations for the unpack method. We are looking forward to using the new wrapr as_named_list/unpack pair to man...
942 sym 2 img
Data re-Shaping in R and in Python
Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial. This reflects our opinion on the “which is better for data science R or Python?” They both are great. So start with one, and exp...
1501 sym
R Tip: Check What Repos You are Using
In a lot of our R writing we casually say “install from CRAN using install.packages('PKGNAME')” or “update your packages by using update.packages(ask = FALSE, checkBuilt = TRUE) (and answering ‘no’ to all questions about compiling).” We recently became aware that for some users this isn’t complete advice. The above depends on your ...
3425 sym
wrapr Update: Removing Some Under-Used Functions and Classes
For the next version of the R package wrapr we are going to be removing a number of under-used functions/methods and classes. This update will likely happen in March 2020, and is the start of the wrapr 2.* series. Most of the items being removed are different abstractions for helping with function composition. We ended up moving most of our work...
1237 sym
New Data Scientist Stickers
We have a new data scientist sticker! If you see Nina or John at a conference/MeetUp, please ask us for a sticker! Related To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Cl...
517 sym 2 img
New improved cdata instructional video
We have a new improved version of the “how to design a cdata/data_algebra data transform” up! The original article, the Python example, and the R example have all been updated to use the new video. Please check it out! Related To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. R-blo...
621 sym
What is New For vtreat 1.5.2?
vtreat version 1.5.2 just became available from CRAN. We have a logged a few improvement in the NEWS. The changes are small and incremental, as the package is already in a great stable state for production use. One of the biggest improvements is documentation clean up, and adapting the examples to use wrapr unpack/to multiple assignment notatio...
844 sym
Nifty Upcoming Enhancements to unpack/to
We have some really nifty upcoming enhancements to wrapr unpack/to. One of the new notations is the use of := as an alternate assignment operator for unpack/to. This lets us write code like the following. First let’s attach our package and set up some example data. library(wrapr) # attach package packageVersion("wrapr") # confirm we have at...
3688 sym 1 tbl
Cross-Methods are a Leak/Variance Trade-Off
We have a new Win Vector data science article to share: Cross-Methods are a Leak/Variance Trade-Off John Mount (Win Vector LLC), Nina Zumel (Win Vector LLC) March 10, 2020 We work some exciting examples of when cross-methods (cross validation, and also cross-frames) work, and when they do not work. Abstract Cross-methods such as cross-valida...
1689 sym