Publications by Max Kuhn
A Tutorial and Talk at useR! 2014
I’ll be doing a morning tutorial at useR! at the end of June in Los Angeles. I’ve done this same presentation at the last few conferences and this will probably be the last time for this specific workshop. I will be including a copy of the book for those who take the tutorial and all the proceeds (minus book costs) will be donated to the Fo...
2374 sym
A Tutorial and Talk at useR! 2014 [Important Update]
See the update below I’ll be doing a morning tutorial at useR! at the end of June in Los Angeles. I’ve done this same presentation at the last few conferences and this will probably be the last time for this specific workshop. The tutorial outline is: Conventions in R Data splitting and estimating performance Data pre-processing Over-fitti...
2722 sym
New caret version with adaptive resampling
A new version of caret is on CRAN now. There are a number of bug fixes: A man page with the list of models available via train was added back into the package. See ?models. Thoralf Mildenberger found and fixed a bug in the variable importance calculation for neural network models. The output of varImp for pamr models was updated to clarify the ...
2262 sym 2 img
useR! 2014 Highlights
My talk went well; here are the slides and a link to the paper pre-print. Hadley Wickham gave an excellent tutorial on dplyr. Based on the talk I saw, I think I will take the data sets from the book and make some public visualizations on the Plotly website. There were a few presentations on interactive graphics that were very good (here, here an...
1174 sym
Some Thoughts on “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?”
Sorry for the blogging break. I’ve got a few planned for the next few weeks based on some work I’ve been doing. In the meantime, you should check out “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?” by Manuel Fernandez-Delgado at JMLR. They took a large number of classifiers and ran them against a large n...
2085 sym
Solutions on github
See this page. We’re not done with them all but chapter 3 and 4 are there and the regression chapters are not too far behind. The Rnw files (using knitr LaTeX) are there along with the corresponding pdf files. You may have better solutions than we have here and we would love to see them. You can do so by creating a pull request or, if you are...
852 sym
Comparing Different Species of Cross-Validation
This is the first of two posts about the performance characteristics of resampling methods. I just had major shoulder surgery, but I’ve pre-seeded a few blog posts. More will come as I get better at one-handed typing. First, a review: Resampling methods, such as cross-validation (CV) and the bootstrap, can be used with predictive models to ge...
8808 sym 14 img
Comparing the Bootstrap and Cross-Validation
This is the second of two posts about the performance characteristics of resampling methods. The first post focused on the cross-validation techniques and this post mostly concerns the bootstrap. Recall from the last post: we have some simulations to evaluate the precision and bias of these methods. I simulated some regression data (so that I kn...
5798 sym 12 img
New Version of caret on CRAN
A new version of caret is on CRAN. Some recent features/changes: The license was changed to GPL >= 2 to accommodate new code from the GA package. New feature selection functions gafs and safs were added, along with helper functions and objects, were added. The package HTML was updated to expand more about feature selection. I’ll talk more a...
2379 sym 2 img
Regression Solutions Available
The github page for the APM exercises has been updated with three new files for Chapters 6-8 (the section on regression). The classifications section is in-progress. Here’s one of our fancy-pants graphs: Related To leave a comment for the author, please follow the link and comment on their blog: Blog - Applied Predictive Modeling. R-blogg...
623 sym 2 img