Publications by jmount

The Intercept Fallacy

07.09.2020

A common mis-understanding of linear regression and logistic regression is that the intercept is thought to encode the unconditional mean or the training data prevalence. This is easily seen to not be the case. Consider the following example in R. library(wrapr) We set up our example data. # build our example data # modeling y as a function of...

1163 sym R (583 sym/6 pcs) 1 tbl

Data Science is a Science (Just Not the One You May Think)

10.09.2020

I am working on a promising new series of notes: common data science fallacies and pitfalls. (Probably still looking for a good name for the series!) I thought I would share a few thoughts on it, and hopefully not jinx it too badly. Data science, is for better or worse, an empirical field. Please consider the following partial definition of scie...

3849 sym

How to Pick an Optimal Utility Threshold Using the ROC Plot

10.10.2020

Nina Zumel just completed an excellent short sequence of articles on picking optimal utility thresholds to convert a continuous model score for a classification problem into a deployable classification rule. Squeezing the Most Utility from Your Models Estimating Uncertainty of Utility Curves This is very compatible with our advice to prefer con...

1542 sym 2 img

Tailored Models are Not The Same as Simple Corrections

11.10.2020

Let’s take a stab at our first note on a topic that pre-establishing the definitions of probability model homotopy makes much easier to write. In this note we will discuss tailored probability models. There are models deliberately fit to training data that has an outcome prevalence equal to the expected outcome prevalence on the data they are t...

3131 sym R (904 sym/6 pcs)

Model Homotopies in the Wild

12.10.2020

So are model homotopies commonly used? Yes, they are. As an example consider glmnet: Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. URL http://www.jstatsoft.org/v33/i01/. From help(glmnet): library(glmnet) x = mat...

1848 sym R (147 sym/2 pcs) 4 img

Surgery on ROC Plots

13.10.2020

This note is a little break from our model homotopy series. I have a neat example where one combines two classifiers to get a better classifier using a method I am calling “ROC surgery.” In ROC surgery we look at multiple ROC plots and decide we want to cut out a section from one the plots for use. It is a sensor fusion method to try and comb...

827 sym 2 img

The Shift and Balance Fallacies

15.10.2020

Two related fallacies I see in machine learning practice are the shift and balance fallacies (for an earlier simple fallacy, please see here). They involve thinking logistic regression has a bit simpler structure that it actually does, and also thinking logistic regression is a bit less powerful than it actually is. The fallacies are somewhat opp...

5056 sym R (1613 sym/23 pcs) 1 tbl

Your Lopsided Model is Out to Get You

26.10.2020

For classification problems I argue one of the biggest steps you can take to improve the quality and utility of your models is to prefer models that return scores or return probabilities instead of classification rules. Doing this also opens a second large opportunity for improvement: working with your domain experts to find new variables to lowe...

11158 sym R (4229 sym/18 pcs) 20 img 4 tbl

The Double Density Plot Contains a Lot of Useful Information

27.10.2020

The double density plot contains a lot of useful information. This is a plot that shows the distribution of a continuous model score, conditioned on the binary categorical outcome to be predicted. As with most density plots: the y-axis is an abstract quantity called density picked such that the area of each curve integrates to 1. An example is gi...

1957 sym R (87 sym/1 pcs) 2 img

An Example of a Calibrated Model that is not Fully Calibrated

28.10.2020

In our last note we mentioned the possibility of “fully calibrated models.” This note is an example of a probability model that is calibrated in the traditional sense, but not fully calibrated in a finer grained sense. First let’s attach our packages and generate our example data in R. library(wrapr) d <- build_frame( "x1" , "x2", "y...

1836 sym R (2120 sym/18 pcs) 6 tbl