Publications by statcompute

More Robust Monotonic Binning Based on Isotonic Regression

23.11.2018

Since publishing the monotonic binning function based upon the isotonic regression (https://statcompute.wordpress.com/2017/06/15/finer-monotonic-binning-based-on-isotonic-regression), I’ve received some feedback from peers. A potential concern is that, albeit improving the granularity and predictability, the binning is too fine and might not ge...

1743 sym R (2760 sym/1 pcs) 2 img

Improving Binning by Bootstrap Bumping

25.11.2018

In the post (https://statcompute.wordpress.com/2018/11/23/more-robust-monotonic-binning-based-on-isotonic-regression), a more robust version of monotonic binning based on the isotonic regression was introduced. Nonetheless, due to the loss of granularity, the predictability has been somewhat compromised, which is a typical dilemma in the data sci...

2907 sym R (3840 sym/2 pcs)

An Utility Function For Monotonic Binning

02.12.2018

In all monotonic algorithms that I posted before, I heavily relied on the smbinning::smbinning.custom() function contributed by Herman Jopia as the utility function generating the binning output and therefore feel deeply indebted to his excellent work. However, the availability of smbinning::smbinning.custom() function shouldn’t become my excus...

1699 sym

Phillips-Ouliaris Test For Cointegration

16.12.2018

In a project of developing PPNR balance projection models, I tried to use the Phillips-Ouliaris (PO) test to investigate the cointegration between the historical balance and a set of macro-economic variables and noticed that implementation routines of PO test in various R packages, e.g. urca and tseries, would give different results. After readin...

3237 sym R (1189 sym/4 pcs)

Statistical Assessments of AUC

25.12.2018

In the scorecard development, the area under ROC curve, also known as AUC, has been widely used to measure the performance of a risk scorecard. Given everything else equal, the scorecard with a higher AUC is considered more predictive than the one with a lower AUC. However, little attention has been paid to the statistical analysis of AUC itself ...

2408 sym R (2504 sym/3 pcs)

Co-integration and Mean Reverting Portfolio

05.01.2019

In the previous post https://statcompute.wordpress.com/2018/07/29/co-integration-and-pairs-trading, it was shown how to identify two co-integrated stocks in the pair trade. In the example below, I will show how to form a mean reverting portfolio with three or more stocks, e.g. stocks with co-integration, and also how to find the linear combinatio...

2101 sym R (1742 sym/5 pcs)

Sobol Sequence vs. Uniform Random in Hyper-Parameter Optimization

03.02.2019

Tuning hyper-parameters might be the most tedious yet crucial in various machine learning algorithms, such as neural networks, svm, or boosting. The configuration of hyper-parameters not only impacts the computational efficiency of a learning algorithm but also determines its prediction accuracy. Thus far, manual tuning and grid searching are st...

1698 sym 2 img

Direct Optimization of Hyper-Parameter

10.02.2019

In the previous post (https://statcompute.wordpress.com/2019/02/03/sobol-sequence-vs-uniform-random-in-hyper-parameter-optimization), it is shown how to identify the optimal hyper-parameter in a General Regression Neural Network by using the Sobol sequence and the uniform random generator respectively through the N-fold cross validation. While th...

1716 sym R (472 sym/1 pcs)

Gradient-Free Optimization for GLMNET Parameters

23.02.2019

In the post https://statcompute.wordpress.com/2017/09/03/variable-selection-with-elastic-net, it was shown how to optimize hyper-parameters, namely alpha and gamma, of the glmnet by using the built-in cv.glmnet() function. However, following a similar logic of hyper-parameter optimization shown in the post https://statcompute.wordpress.com/2019/0...

1899 sym

Bayesian Optimization for Hyper-Parameter

24.02.2019

In past several weeks, I spent a tremendous amount of time on reading literature about automatic parameter tuning in the context of Machine Learning (ML), most of which can be classified into two major categories, e.g. search and optimization. Searching mechanisms, such as grid search, random search, and Sobol sequence, can be somewhat computatio...

2027 sym