Publications by John Mount

Coming up: principal components analysis

07.05.2016

Just a “heads-up.” I’ve been editing a two-part series Nina Zumel is writing on some of the pitfalls of improperly applied principal components analysis/regression and how to avoid them (we are using the plural spelling as used in following Everitt The Cambridge Dictionary of Statistics). The series is looking absolutely fantastic and I th...

1185 sym 2 img

For a short time: Half Off Some Manning Data Science Books

12.05.2016

Our publisher Manning Publications is celebrating the release of a new data science in Python title Introducing Data Science by offering it and other Manning titles at half off until Wednesday, May 18. As part of the promotion you can also use the supplied discount code mlcielenlt for half off some R titles including R in Action, Second Edition ...

966 sym

Installing WVPlots and “knitting R markdown”

20.05.2016

Some readers have been having a bit of trouble using devtools to install WVPlots. I thought I would write a note with a few instructions to help. These are things you should not have to do often, and things those of us already running R have stumbled through and forgotten about. First you will need install (likely admin) privileges on your mac...

3818 sym 4 img

On ranger respect.unordered.factors

30.05.2016

It is often said that “R it its packages.” One package of interest is ranger a fast parallel C++ implementation of random forest machine learning. Ranger is great package and at first glance appears to remove the “only 63 levels allowed for string/categorical variables” limit found in the Fortran randomForest package. Actually this appe...

5537 sym 6 img

A demonstration of vtreat data preparation

01.06.2016

This article is a demonstration the use of the R vtreat variable preparation package followed by caret controlled training. In previous writings we have gone to great lengths to document, explain and motivate vtreat. That necessarily gets long and unnecessarily feels complicated. In this example we are going to show what building a predictive mod...

3972 sym R (4741 sym/19 pcs) 4 img

Using geom_step

03.06.2016

geom_step is an interesting geom supplied by the R package ggplot2. It is an appropriate rendering option for financial market data and we will show how and why to use it in this article. Let’s take a simple example of plotting market data. In this case we are plotting the “ask price” (the publicly published price an item is available for ...

6870 sym R (4818 sym/16 pcs) 14 img

Free e-book: Exploring Data Science

08.06.2016

We are pleased to announce a new free e-book from Manning Publications: Exploring Data Science. Exploring Data Science is a collection of five chapters hand picked by John Mount and Nina Zumel, introducing you to various areas in data science and explaining which methodologies work best for each. Exploring Data Science gives you a free sample o...

1737 sym 2 img

Why you should read Nina Zumel’s 3 part series on principal components analysis and regression

09.06.2016

Short form: Win-Vector LLC’s Dr. Nina Zumel has a three part series on Principal Components Regression that we think is well worth your time. Part 1: the proper preparation of data (including scaling) and use of principal components analysis (particularly for supervised learning or regression). Part 2: the introduction of y-aware scaling to di...

7937 sym 10 img

y-aware scaling in context

22.06.2016

Nina Zumel introduced y-aware scaling in her recent article Principal Components Regression, Pt. 2: Y-Aware Methods. I really encourage you to read the article and add the technique to your repertoire. The method combines well with other methods and can drive better predictive modeling results. From feedback I am not sure everybody noticed that...

4789 sym

vtreat version 0.5.26 released on CRAN

12.07.2016

Win-Vector LLC, Nina Zumel and I are pleased to announce that ‘vtreat’ version 0.5.26 has been released on CRAN. ‘vtreat’ is a data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. (from the package documentation) ‘vtreat’ is an R package that incorporates a number of ...

5181 sym 2 tbl