Publications by jmount
Including ggplot2 Plots in Python Notebooks
For an article on A/B testing that I am preparing, I asked my partner Dr. Nina Zumel if she could do me a favor and write some code to produce the diagrams. She prepared an excellent parameterized diagram generator. However being the author of the book Practical Data Science with R, she built it in R using ggplot2. This would be great, except the A...
2348 sym R (1587 sym/9 pcs) 4 img
Don’t Use Classification Rules for Classification Problems
There’s a common, yet easy to fix, mistake that I often see in machine learning and data science projects and teaching: using classification rules for classification problems.This statement is a bit of word-play which I will need to unroll a bit. However, the concrete advice is that you often get better results using models that return a contin...
5362 sym 2 img
Don’t Use Classification Rules for Classification Problems
There’s a common, yet easy to fix, mistake that I often see in machine learning and data science projects and teaching: using classification rules for classification problems.This statement is a bit of word-play which I will need to unroll a bit. However, the concrete advice is that you often get better results using models that return a contin...
5241 sym 2 img
Plotting Multiple Curves in Python
I have up what I think is a really neat tutorial on how to plot multiple curves on a graph in Python, using seaborn and data_algebra.It is great way to show some data shaping theory convenience functions we have developed.Please check it out.Related To leave a comment for the author, please follow the link and comment on their blog: python – W...
422 sym 2 img
Plotting Multiple Curves in Python
I have up what I think is a really neat tutorial on how to plot multiple curves on a graph in Python, using seaborn and data_algebra.It is great way to show some data shaping theory convenience functions we have developed.Please check it out.Related To leave a comment for the author, please follow the link and comment on their blog: python – W...
422 sym 2 img
What is Chapter 8 of Practical Data Science with R?
Chapter 8 “Advanced Data Preparation” of Practical Data Science with R is a study in: Using the R vtreat package for advanced data preparation. Cross-validated data preparation. It is the professionally edited, ready to cite version of an important data preparation methodology. An advantage being: a number of well documented result improvin...
1190 sym
Free vtreat Tutorial Videos
I would like to re-share links to our free vtreat data preparation system introduction videos, which show you what sort of machine learning problems vtreat can help you with. Python vtreat introduction video (PyData LA 2019), slides here. R vtreat introduction video (Why R? Foundation). The idea is: instead of attempting to automate all of mach...
1120 sym
An Example Where Square Loss of a Sigmoid Prediction is not Convex in the Parameters
I’ve added a worked R example of the non-convexity, with respect to model parameters, of square loss of a sigmoid-derived prediction here. This is finishing an example for our Python note “Why not Square Error for Classification?”. Reading that note will give a usable context and background for this diagram. The undesirable property is: s...
1119 sym 2 img
New WVPlot: ROCPlotPairList
We have a new R WVPlots plot: ROCPlotPairList. It is useful for comparing the ROC/AUC of multiple models on the same data set. library(WVPlots) set.seed(34903490) x1 <- rnorm(50) x2 <- rnorm(length(x1)) x3 <- rnorm(length(x1)) y <- 0.2*x2^2 + 0.5*x2 + x1 + rnorm(length(x1)) frm <- data.frame( x1 = x1, x2 = x2, x3 = x3, yC = y >= as.n...
550 sym R (404 sym/1 pcs) 2 img
0.83 is a Special AUC
0.83 (or more precisely 5/6) is a special Area Under the Curve (AUC), which we will show in this note. For a classification problem a good probability model has two important properties: The model is well calibrated. When the model says there is a p-probability of being in the class, the item is in the class with a frequency close to p. The mod...
2710 sym R (1208 sym/6 pcs) 12 img