Publications by Max Kuhn
In Search Of…
Rafael Ladeira asked on github: I was wondering why it doesn’t implement some others algorithms for search for optimal tuning parameters. What would be the caveats of using a genetic algorithm , for instance, instead of grid or random search? Do you think using some of those powerful optimization algorithms for tuning parameters is a good ide...
6477 sym R (3082 sym/3 pcs) 14 img
Central Iowa R User Group Talk [Updated]
I’ll be giving a talk (“Applied Predictive Modeling”) to the Central Iowa R User Group on Thursday night at 6:00 PM to 8:00 PM (CST). It looks like it will be broadcast live on YouTube. The link is https://www.youtube.com/watch?v=99lnTku75Pc. Update: Here are my sldies and code for the session Related To leave a comment for the author,...
715 sym 2 img
Central Iowa R User Group Talk [Updated]
I’ll be giving a talk (“Applied Predictive Modeling”) to the Central Iowa R User Group on Thursday night at 6:00 PM to 8:00 PM (CST). It looks like it will be broadcast live on YouTube. The link is https://www.youtube.com/watch?v=99lnTku75Pc. Update: Here are my sldies and code for the session Related To leave a comment for the author,...
715 sym 2 img
Boston R User Group Talk [UPDATE]
I’ll be giving a talk on Boston R user Group on Thursday March 10th at 6:00 PM. The talk will be on rule-based regression models. The image above is the training/test set split for the data that I’ll be using the illustrate the models. Slides can be found here. Someone took video and I will link to that if it is posted soemwhere. Related...
753 sym 2 img
Boston R User Group Talk [UPDATE]
I’ll be giving a talk on Boston R user Group on Thursday March 10th at 6:00 PM. The talk will be on rule-based regression models. The image above is the training/test set split for the data that I’ll be using the illustrate the models. Slides can be found here. Someone took video and I will link to that if it is posted soemwhere. Related...
753 sym 2 img
DataCamp Course
Zachary Deane-Mayer, who collaborates on caret, has put together a DataCamp course on Machine Learning in R. Zach and DataCamp did a great job of developing a course that is just right for people who are relatively new to R. The really cool thing about the course is that their system lets you execute the R code as the instructors walk you throu...
1117 sym 2 img
2016 UK Tour
I’ll be in the UK next week doing three talks in three days: First, I’ll be giving a talk at the London R-Ladies meetup on Monday October 3rd with perhaps the best title yet: Whose Scat Is That? An ‘Easily Digestible’ Introduction to Predictive Modeling and caret. On Tuesday, October 4th I’m giving a talk at the Cambridge RUG on tuni...
1107 sym 2 img
Working at RStudio
I’ve joined Hadley’s team at RStudio. Unsurprisingly, I’ll be working on some modeling related R packages and infrastructure. It is very exciting and I’m looking forward to learning a lot and creating some cool things. I’ve had a great time doing drug discovery at Pfizer for the last 12ish years and I’ll miss working with everyone t...
943 sym 2 img
Do Resampling Estimates Have Low Correlation to the Truth? The Answer May Shock You.
One criticism that is often leveled against using resampling methods (such as cross-validation) to measure model performance is that there is no correlation between the CV results and the true error rate. Let’s look at this with some simulated data. While this assertion is often correct, there are a few reasons why you shouldn’t care. The S...
4226 sym 12 img
Nested Resampling with rsample
A typical scheme for splitting the data when developing a predictive model is to create an initial split of the data into a training and test set. If resampling is used, it is executed on the training set where a series of binary splits is created. In rsample, we use the term analysis set for the data that are used to fit the model and the asses...
6889 sym R (7776 sym/16 pcs) 8 img