Publications by Max Kuhn

Simulated Annealing Feature Selection

12.01.2015

As previously mentioned, caret has two new feature selection routines based on genetic algorithms (GA) and simulated annealing (SA). The help pages for the two new functions give a detailed account of the options, syntax etc. The package already has functions to conduct feature selection using simple filters as well as recursive feature elimina...

10105 sym R (4125 sym/13 pcs) 10 img

A Talk and Course in NYC Next Week

13.02.2015

I’ll be giving talk on Tuesday February 17 (7:00PM-9:00PM) that will be an overview of predictive modeling. It will not be highly technical and here is the current outline: “Predictive modeling” definition Some example applications A short overview and example How is this different from what statisticians already do? What can drive choic...

1079 sym 2 img

Slides from recent talks

21.04.2015

I’ve been buried in work lately but thought I’d share the slides from two recent talks. The first is from the Bay Area RUG. Since someone filmed the talks, I was waiting to post the slides. The video of my talk (“Greatest Hits R Mixtape”) isn’t availible yet, so here are the slides. The second talk is from last week’s Thirteenth Annu...

853 sym 2 img

New caret Version (6.0-52)

22.07.2015

A new version of caret (6.0-52) is on CRAN. Here is the news file but the Cliff Notes are: sub-sampling for class imbalances is now integrated with train and is used inside of standard resampling. There are four methods available right now: up- and down-sampling, SMOTE, and ROSE. The help page has detailed information. Nine additional models w...

1709 sym

New caret Version (6.0-52)

22.07.2015

A new version of caret (6.0-52) is on CRAN. Here is the news file but the Cliff Notes are: sub-sampling for class imbalances is now integrated with train and is used inside of standard resampling. There are four methods available right now: up- and down-sampling, SMOTE, and ROSE. The help page has detailed information. Nine additional models w...

1709 sym

Feature Engineering versus Feature Extraction: Game On!

03.08.2015

“Feature engineering” is a fancy term for making sure that your predictors are encoded in the model in a manner that makes it as easy as possible for the model to achieve good performance. For example, if your have a date field as a predictor and there are larger differences in response for the weekends versus the weekdays, then encoding the ...

7324 sym R (885 sym/3 pcs) 10 img

Feature Engineering versus Feature Extraction: Game On!

03.08.2015

“Feature engineering” is a fancy term for making sure that your predictors are encoded in the model in a manner that makes it as easy as possible for the model to achieve good performance. For example, if your have a date field as a predictor and there are larger differences in response for the weekends versus the weekdays, then encoding the ...

7324 sym R (885 sym/3 pcs) 10 img

C5.0 Class Probability Shrinkage

14.09.2015

(The image above has nothing do to with this post. It does, however, show the prize that my daughter won during a recent vacation to Virginia and how I got it back home). I was recently asked to explain a potential disconnect in C5.0 between the class probabilities shown in the terminal nodes and the values generated by the prediction code. Her...

3587 sym R (1302 sym/4 pcs) 2 img

C5.0 Class Probability Shrinkage

14.09.2015

(The image above has nothing do to with this post. It does, however, show the prize that my daughter won during a recent vacation to Virginia and how I got it back home). I was recently asked to explain a potential disconnect in C5.0 between the class probabilities shown in the terminal nodes and the values generated by the prediction code. Her...

3182 sym R (1322 sym/4 pcs) 2 img

In Search Of…

13.12.2015

Rafael Ladeira asked on github: I was wondering why it doesn’t implement some others algorithms for search for optimal tuning parameters. What would be the caveats of using a genetic algorithm , for instance, instead of grid or random search? Do you think using some of those powerful optimization algorithms for tuning parameters is a good ide...

6477 sym R (3082 sym/3 pcs) 14 img