Publications by statcompute

WoE Transformation for Loss Given Default Models

26.05.2019

In the intro section of my MOB package (https://github.com/statcompute/MonotonicBinning#introduction), reasons and benefits of using WoE transformations in the context of logistic regressions with binary outcomes had been discussed. What’s more, the similar idea can be easily generalized to other statistical models in the credit risk area, such...

1850 sym

Parallel R: Socket or Fork

30.06.2019

In the R parallel package, there are two implementations of parallelism, e.g. fork and socket, with pros and cons. For the fork, each parallel thread is a complete duplication of the master process with the shared environment, including objects or variables defined prior to the kickoff of parallel threads. Therefore, it runs fast. However, the m...

1356 sym R (1556 sym/1 pcs)

Latin Hypercube Sampling in Hyper-Parameter Optimization

06.07.2019

In my previous post https://statcompute.wordpress.com/2019/02/03/sobol-sequence-vs-uniform-random-in-hyper-parameter-optimization/, I’ve shown the difference between the uniform pseudo random and the quasi random number generators in the hyper-parameter optimization of machine learning. Latin Hypercube Sampling (LHS) is another interesting way...

1996 sym R (2939 sym/2 pcs) 4 img

Chunk Averaging of GLM

07.07.2019

Chunk Average (CA) is an interesting concept proposed by Matloff in the chapter 13 of his book “Parallel Computing for Data Science”. The basic idea is to partition the entire model estimation sample into chunks and then to estimate a glm for each chunk. Under the i.i.d assumption, the CA estimator with the chunked data is asymptotically equi...

1266 sym

Monotonic Binning Driven by Decision Tree

08.07.2019

After the development of MOB package (https://github.com/statcompute/MonotonicBinning), I was asked by a couple users about the possibility of using the decision tree to drive the monotonic binning. Although I am not aware of any R package implementing the decision tree with the monotonic constraint, I did manage to find a solution based upon the...

1353 sym

Yet Another R Package for General Regression Neural Network

14.07.2019

Compared with other types of neural networks, General Regression Neural Network (Specht, 1991) is advantageous in several aspects. Being an universal approximation function, GRNN has only one tuning parameter to control the overall generalization The network structure of GRNN is surprisingly simple, with only one hidden layer and the number of...

3074 sym

Improve GRNN by Weighting

21.07.2019

In the post (https://statcompute.wordpress.com/2019/07/14/yet-another-r-package-for-general-regression-neural-network), several advantages of General Regression Neural Network (GRNN) have been discussed. However, as pointed out by Specht, a major weakness of GRNN is the high computational cost required for a GRNN to generate predicted values base...

2597 sym

Dummy Is As Dummy Does

24.08.2019

In the 1975 edition of “Applied multiple regression/correlation analysis for the behavioral sciences” by Jacob Cohen, an interesting approach of handling missing values in numeric variables was proposed with the purpose to improve the traditional single-value imputation, as described below: – First of all, impute missing values by the value...

2163 sym

Develop Performance Benchmark with GRNN

14.09.2019

It has been mentioned in https://github.com/statcompute/GRnnet that GRNN is an ideal approach employed to develop performance benchmarks for a variety of risk models. People might wonder what the purpose of performance benchmarks is and why we would even need one at all. Sometimes, a model developer had to answer questions about how well the mode...

2099 sym 2 img

Hyper-Parameter Optimization of General Regression Neural Networks

12.10.2019

A major advantage of General Regression Neural Networks (GRNN) over other types of neural networks is that there is only a single hyper-parameter, namely the sigma. In the previous post (https://statcompute.wordpress.com/2019/07/06/latin-hypercube-sampling-in-hyper-parameter-optimization), I’ve shown how to use the random search strategy to fin...

1503 sym 2 img