Publications by John Mount

The Advantages of Record Transform Specifications

18.09.2019

Nina Zumel had a really great article on how to prepare a nice Keras performance plot using R. I will use this example to show some of the advantages of cdata record transform specifications. The model performance data from Keras is in the following format: # R code library(wrapr) df <- wrapr::build_frame( "val_loss" , "val_acc", "loss" ,...

2515 sym R (6281 sym/19 pcs) 2 img 2 tbl

Preparing Data for Supervised Classification

24.09.2019

Nina Zumel has been polishing up new vtreat for Python documentation and tutorials. They are coming out so good that I find to be fair to the R community I must start to back-port this new documentation to vtreat for R. vtreat is a package for systematically preparing data for supervised machine learning tasks such as classification or regressi...

2178 sym

How to Prepare Data

26.09.2019

Real world data can present a number of challenges to data science workflows. Even properly structured data (each interesting measurement already landed in distinct columns), can present problems, such as missing values and high cardinality categorical variables. In this note we describe some great tools for working with such data. For an examp...

3678 sym R (312 sym/1 pcs)

New vtreat Documentation (Starting with Multinomial Classification)

01.10.2019

Nina Zumel finished some great new documentation showing how to use Python vtreat to prepare data for multinomial classification mode. And I have finally finished porting the documentation to R vtreat. So we now have good introductions on how to use vtreat to prepare data for the common tasks of: Regression: R regression example, Python regres...

1550 sym

You Can Override Just About Anything in R

02.10.2019

To understand computations in R, two slogans are helpful: Everything that exists is an object. Everything that happens is a function call. John Chambers In R, the “[” array access operator is a function call. And it is one a user can re-bind to the new effect of their own choosing. Let’s see what sort of mischief we can get into using t...

3571 sym

vtreat Cross Validation

05.10.2019

Nina Zumel finished new documentation on how vtreat‘s cross validation works, which I want to share here. vtreat is a system that makes data preparation for machine learning a “one-liner” (available in R or available in Python). We have a set of starting off points here. These documents describe what vtreat does for you, you just find the...

1189 sym

Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter

15.10.2019

We are excited to share a free extract of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019: Evaluating a Classification Model with a Spam Filter. This section reflects an important design decision in the book: teach model evaluation first, and as a step separate from model construction. It is funny, but it takes some effort...

1909 sym 2 img

Practical Data Science with R 2nd Edition update

17.10.2019

We are in the last stages of proofing the galleys/typesetting of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019. So this edition will definitely be out soon! If you ever wanted to see what Nina Zumel and John Mount are like when we have the help of editors, this book is your chance! One thing I noticed in working through ...

855 sym

New Introduction to rquery

27.10.2019

Introduction rquery is a data wrangling system designed to express complex data manipulation as a series of simple data transforms. This is in the spirit of R’s base::transform(), or dplyr’s dplyr::mutate() and uses a pipe in the style popularized in R with magrittr. The operators themselves follow the selections in Codd’s relational algebr...

8572 sym R (2625 sym/18 pcs) 16 tbl

Practical Data Science with R, 2nd Edition, IS OUT!!!!!!!

15.11.2019

Practical Data Science with R, 2nd Edition author Dr. Nina Zumel, with a fresh author’s copy of her book! Related To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here...

508 sym 2 img