Publications by Wicked Good Data - r
Partial Dependence Plots
It can be difficult to understand the functional relations between predictors and an outcome when using black box prediction methods like random forests. One way to investigate these relations is with partial dependence plots. These plots are graphical visualizations of the marginal effect of a given variable (or multiple variables) on an outcome...
4630 sym R (1836 sym/1 pcs) 2 img
Clustering Mixed Data Types in R
Clustering allows us to better understand how a sample might be comprised of distinct subgroups given a set of variables. While many introductions to cluster analysis typically review a simple application using continuous variables, clustering data of mixed types (e.g., continuous, ordinal, and nominal) is often of interest. The following is an o...
9057 sym R (7197 sym/17 pcs) 4 img
Clustering Mixed Data Types in R
Clustering allows us to better understand how a sample might be comprised of distinct subgroups given a set of variables. While many introductions to cluster analysis typically review a simple application using continuous variables, clustering data of mixed types (e.g., continuous, ordinal, and nominal) is often of interest. The following is an o...
9057 sym R (7197 sym/17 pcs) 4 img
Handling Class Imbalance with R and Caret – An Introduction
When faced with classification tasks in the real world, it can be challenging to deal with an outcome where one class heavily outweighs the other (a.k.a., imbalanced classes). The following will be a two-part post on some of the techniques that can help to improve prediction performance in the case of imbalanced classes using R and caret. This fi...
6953 sym R (4066 sym/8 pcs) 2 img
Handling Class Imbalance with R and Caret – An Introduction
When faced with classification tasks in the real world, it can be challenging to deal with an outcome where one class heavily outweighs the other (a.k.a., imbalanced classes). The following will be a two-part post on some of the techniques that can help to improve prediction performance in the case of imbalanced classes using R and caret. This fi...
6953 sym R (4066 sym/8 pcs) 2 img
Handling Class Imbalance with R and Caret – Caveats when using the AUC
In my last post, I went over how weighting and sampling methods can help to improve predictive performance in the case of imbalanced classes. I also included an applied example with a simulated dataset that used the area under the ROC curve (AUC) as the evaluation metric. In this post, I will go over some issues to keep in mind when using the AUC...
6799 sym R (5435 sym/8 pcs) 6 img
Handling Class Imbalance with R and Caret – Caveats when using the AUC
In my last post, I went over how weighting and sampling methods can help to improve predictive performance in the case of imbalanced classes. I also included an applied example with a simulated dataset that used the area under the ROC curve (AUC) as the evaluation metric. In this post, I will go over some issues to keep in mind when using the AUC...
6391 sym R (5417 sym/8 pcs) 6 img