Publications by statcompute
WoE Transformation for Loss Given Default Models
In the intro section of my MOB package (https://github.com/statcompute/MonotonicBinning#introduction), reasons and benefits of using WoE transformations in the context of logistic regressions with binary outcomes had been discussed. What’s more, the similar idea can be easily generalized to other statistical models in the credit risk area, such...
1850 sym
Parallel R: Socket or Fork
In the R parallel package, there are two implementations of parallelism, e.g. fork and socket, with pros and cons. For the fork, each parallel thread is a complete duplication of the master process with the shared environment, including objects or variables defined prior to the kickoff of parallel threads. Therefore, it runs fast. However, the m...
1356 sym R (1556 sym/1 pcs)
Latin Hypercube Sampling in Hyper-Parameter Optimization
In my previous post https://statcompute.wordpress.com/2019/02/03/sobol-sequence-vs-uniform-random-in-hyper-parameter-optimization/, I’ve shown the difference between the uniform pseudo random and the quasi random number generators in the hyper-parameter optimization of machine learning. Latin Hypercube Sampling (LHS) is another interesting way...
1996 sym R (2939 sym/2 pcs) 4 img
Chunk Averaging of GLM
Chunk Average (CA) is an interesting concept proposed by Matloff in the chapter 13 of his book “Parallel Computing for Data Science”. The basic idea is to partition the entire model estimation sample into chunks and then to estimate a glm for each chunk. Under the i.i.d assumption, the CA estimator with the chunked data is asymptotically equi...
1266 sym
Monotonic Binning Driven by Decision Tree
After the development of MOB package (https://github.com/statcompute/MonotonicBinning), I was asked by a couple users about the possibility of using the decision tree to drive the monotonic binning. Although I am not aware of any R package implementing the decision tree with the monotonic constraint, I did manage to find a solution based upon the...
1353 sym
Yet Another R Package for General Regression Neural Network
Compared with other types of neural networks, General Regression Neural Network (Specht, 1991) is advantageous in several aspects. Being an universal approximation function, GRNN has only one tuning parameter to control the overall generalization The network structure of GRNN is surprisingly simple, with only one hidden layer and the number of...
3074 sym
Improve GRNN by Weighting
In the post (https://statcompute.wordpress.com/2019/07/14/yet-another-r-package-for-general-regression-neural-network), several advantages of General Regression Neural Network (GRNN) have been discussed. However, as pointed out by Specht, a major weakness of GRNN is the high computational cost required for a GRNN to generate predicted values base...
2597 sym
Dummy Is As Dummy Does
In the 1975 edition of “Applied multiple regression/correlation analysis for the behavioral sciences” by Jacob Cohen, an interesting approach of handling missing values in numeric variables was proposed with the purpose to improve the traditional single-value imputation, as described below: – First of all, impute missing values by the value...
2163 sym
Develop Performance Benchmark with GRNN
It has been mentioned in https://github.com/statcompute/GRnnet that GRNN is an ideal approach employed to develop performance benchmarks for a variety of risk models. People might wonder what the purpose of performance benchmarks is and why we would even need one at all. Sometimes, a model developer had to answer questions about how well the mode...
2099 sym 2 img
Hyper-Parameter Optimization of General Regression Neural Networks
A major advantage of General Regression Neural Networks (GRNN) over other types of neural networks is that there is only a single hyper-parameter, namely the sigma. In the previous post (https://statcompute.wordpress.com/2019/07/06/latin-hypercube-sampling-in-hyper-parameter-optimization), I’ve shown how to use the random search strategy to fin...
1503 sym 2 img