Publications by John Mount

Why to try Practical Data Science with R, 2nd Edition

22.12.2019

I thought we would try to express why somebody interested in using the R language (and package ecosystem) for supervised machine learning, data wrangling, analytics projects, and other data science topics should give Practical Data Science with R, 2nd Edition a try. Nina Zumel and I shared the book with two incredible data scientists (Jeremy Howa...

4105 sym 4 img

What is a Second Edition?

24.12.2019

What it is a second edition of a book to its authors? In some sense it is the book the authors thought they were writing the first time. With some good fortune a second edition can be much more than that. For our example: Nina and I received a lot of positive and useful feedback from people who used the first edition of Practical Data Science wi...

3015 sym

Introduction to Data Science in R, Free for 3 days

30.12.2019

To celebrate the new year and the recent release of Practical Data Science with R 2nd Edition, we are offering a free coupon for our video course “Introduction to Data Science.” The following URL and code should get you permanent free access to the video course, if used between now and January 1st 2020: https://www.udemy.com/course/introduct...

780 sym

New Timings for a Grouped In-Place Aggregation Task

02.01.2020

I’d like to share some new timings on a grouped in-place aggregation task. A client of mine was seeing some slow performance, so I decided to time a very simple abstraction of one of the steps of their workflow. Roughly, the task was to add in some derived per-group aggregation columns to a few million row data set. In the application the gro...

2467 sym 1 tbl

Manning Deal of the Day January 3, 2020 : Half off Practical Data Science with R, Second Edition

02.01.2020

Manning Deal of the Day January 3, 2020 : Half off Practical Data Science with R, Second Edition. Use code dotd010320au at http://bit.ly/39vD1G4 Please share! Related To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. R-bloggers.com offers daily e-mail updates about R news and tutorials...

560 sym

New Year’s Resolution 2020: Work on more R Data Science Projects

04.01.2020

We had such a positive reception to our last Introduction to Data Science promotion, that we are going to try and make the course available to more people by lowering the base-price to $29.99. We are also creating a 1 month promotional price of $20.99. To get a permanent subscription to the course for less than $21 just visit this link https://...

1110 sym 2 img

New vtreat Feature: Nested Model Bias Warning

11.01.2020

For quite a while we have been teaching estimating variable re-encodings on the exact same data they are later naively using to train a model on, leads to an undesirable nested model bias. The vtreat package (both the R version and Python version) both incorporate a cross-frame method that allows one to use all the training data both to build lea...

2330 sym R (919 sym/5 pcs)

sklearn Pipe Step Interface for vtreat

14.01.2020

We’ve been experimenting with this for a while, and the next R vtreat package will have a back-port of the Python vtreat package sklearn pipe step interface (in addition to the standard R interface). This means the user can express easily express modeling intent by choosing between coder$fit_transform(train_data), coder$fit(train_data_cal)$tra...

2386 sym

unpack Your Values in R

20.01.2020

I would like to introduce an exciting feature in the upcoming 1.9.6 version of the wrapr R package: value unpacking. The unpacking notation is made available if you install wrapr version 1.9.6 from Github: remotes::install_github("WinVector/wrapr") We will likely send this version to CRAN in a couple of weeks. Here is an example of the unpack ...

5694 sym

Using unpack to Manage Your R Environment

21.01.2020

In our last note we stated that unpack is a good tool for load R RDS files into your working environment. Here is the idea expanded into a worked example. # remotes::install_github("WinVector/wrapr") library(wrapr) a <- 5 b <- 7 do_not_want <- 13 # save the elements of our workspace we want saveRDS(as_named_list(a, b), 'example_data.RDS') # ...

1751 sym