Publications by statcompute
A Summary of My Home-Brew Binning Algorithms for Scorecard Development
Thus far, I have published four different monotonic binning algorithms for the scorecard development and think that it might be a right timing to do a quick summary. R functions for these binning algorithms are also available on https://github.com/statcompute/MonotonicBinning. The first one was posted back in 2017 (https://statcompute.wordpress....
3429 sym
Deployment of Binning Outcomes in Production
In my previous post (https://statcompute.wordpress.com/2019/03/10/a-summary-of-my-home-brew-binning-algorithms-for-scorecard-development), I’ve shown different monotonic binning algorithm that I developed over time. However, these binning functions are all useless without a deployment vehicle in production. During the weekend, I finally had tim...
1628 sym
Monotonic Binning with GBM
In addition to monotonic binning algorithms introduced in my previous post (https://statcompute.wordpress.com/2019/03/10/a-summary-of-my-home-brew-binning-algorithms-for-scorecard-development), two more functions based on Generalized Boosted Regression Models have been added to my GitHub repository, gbm_bin() and gbmcv_bin(). The function gbm_bi...
1612 sym
Batch Processing of Monotonic Binning
In my GitHub repository (https://github.com/statcompute/MonotonicBinning), multiple R functions have been developed to implement the monotonic binning by using either iterative discretization or isotonic regression. With these functions, we can run the monotonic binning for one independent variable at a time. However, in a real-world production e...
2145 sym
Batch Deployment of WoE Transformations
After wrapping up the function batch_woe() today with the purpose to allow users to apply WoE transformations to many independent variables simultaneously, I have completed the development of major functions in the MOB package that can be usable for the model development in a production setting. The function batch_woe() basically is the wrapper ...
1604 sym R (617 sym/1 pcs)
Binning with Weights
After working on the MOB package, I received requests from multiple users if I can write a binning function that takes the weighting scheme into consideration. It is a legitimate request from the practical standpoint. For instance, in the development of fraud detection models, we often would sample down non-fraud cases given an extremely low freq...
2188 sym
More General Weighted Binning
You might be wondering what motivates me spending countless weekend hours on the MOB package. The answer is plain and simple. It is users that are driving the development work. After I published the wts_bin() function last week showing the impact of two-value weights on the monotonic binning outcome (https://statcompute.wordpress.com/2019/04/21/...
1767 sym
Why Use Weight of Evidence?
I had been asked why I spent so much effort on developing SAS macros and R functions to do monotonic binning for the WoE transformation, given the availability of other cutting-edge data mining algorithms that will automatically generate the prediction with whatever predictors fed in the model. Nonetheless, what really distinguishes a good modele...
2808 sym 2 img
Granular Weighted Binning by Generalized Boosted Model
In the post https://statcompute.wordpress.com/2019/04/27/more-general-weighted-binning, I’ve shown how to do the weighted binning with the function wqtl_bin() by the iterative partitioning. However, the outcome from wqtl_bin() sometimes can be too coarse. The function wgbm_bin() (https://github.com/statcompute/MonotonicBinning/blob/master/code/...
1149 sym 2 img
Faster Way to Slice Dataframe by Row
When we’d like to slice a dataframe by row, we can employ the split() function or the iter() function in the iterators package. By leveraging the power of parallelism, I wrote an utility function slice() to faster slice the dataframe. In the example shown below, the slice() is 3 times more efficient than the split() or the iter() to select 2 r...
804 sym R (1115 sym/1 pcs)