Publications by statcompute

Model Operational Loss Directly with Tweedie GLM

29.06.2017

In the development of operational loss forecasting models, the Frequency-Severity modeling approach, which the frequency and the severity of a Unit of Measure (UoM) are modeled separately, has been widely employed in the banking industry. However, sometimes it also makes sense to model the operational loss directly, especially for UoMs with non-m...

3057 sym R (1986 sym/2 pcs) 4 img

DART: Dropout Regulation in Boosting Ensembles

19.08.2017

The dropout approach developed by Hinton has been widely employed in the context of deep learnings to prevent the deep neural network from over-fitting, as shown in https://statcompute.wordpress.com/2017/01/02/dropout-regularization-in-deep-neural-networks. In the paper http://proceedings.mlr.press/v38/korlakaivinayak15.pdf, the dropout is also ...

2394 sym R (1263 sym/3 pcs) 4 img

Model Operational Losses with Copula Regression

20.08.2017

In the previous post (https://statcompute.wordpress.com/2017/06/29/model-operational-loss-directly-with-tweedie-glm), it has been explained why we should consider modeling operational losses for non-material UoMs directly with Tweedie models. However, for material UoMs with significant losses, it is still beneficial to model the frequency and the...

2884 sym R (2669 sym/3 pcs) 6 img

DART: Dropout Regularization in Boosting Ensembles

20.08.2017

The dropout approach developed by Hinton has been widely employed in deep learnings to prevent the deep neural network from overfitting, as shown in https://statcompute.wordpress.com/2017/01/02/dropout-regularization-in-deep-neural-networks. In the paper http://proceedings.mlr.press/v38/korlakaivinayak15.pdf, the dropout can also be used to addr...

2396 sym R (1238 sym/3 pcs) 4 img

Variable Selection with Elastic Net

03.09.2017

LASSO has been a popular algorithm for the variable selection and extremely effective with high-dimension data. However, it often tends to “over-regularize” a model that might be overly compact and therefore under-predictive. The Elastic Net addresses the aforementioned “over-regularization” by balancing between LASSO and ridge penalties...

2048 sym R (2849 sym/4 pcs) 4 img

Model Non-Negative Numeric Outcomes with Zeros

17.09.2017

As mentioned in the previous post (https://statcompute.wordpress.com/2017/06/29/model-operational-loss-directly-with-tweedie-glm/), we often need to model non-negative numeric outcomes with zeros in the operational loss model development. Tweedie GLM provides a convenient interface to model non-negative losses directly by assuming that aggregated...

3062 sym R (3266 sym/4 pcs) 4 img

Modeling LGD with Proportional Odds Model

28.01.2018

The LGD model is an important component in the expected loss calculation. In https://statcompute.wordpress.com/2015/11/01/quasi-binomial-model-in-sas, I discussed how to model LGD with the quasi-binomial regression that is simple and makes no distributional assumption. In the real-world LGD data, we usually would observe 3 ordered categories of v...

2833 sym R (1000 sym/3 pcs)

Additional Thoughts on Estimating LGD with Proportional Odds Model

06.02.2018

In my previous post (https://statcompute.wordpress.com/2018/01/28/modeling-lgd-with-proportional-odds-model), I’ve discussed how to use Proportional Odds Models in the LGD model development. In particular, I specifically mentioned that we would estimate a sub-model, which can be Gamma or Simplex regression, to project the conditional mean for L...

2932 sym R (2174 sym/4 pcs)

R Interfaces to Python Keras Package

11.02.2018

Keras is a popular Python package to do the prototyping for deep neural networks with multiple backends, including TensorFlow, CNTK, and Theano. Currently, there are two R interfaces that allow us to use Keras from R through the reticulate package. While the keras R package is able to provide a flexible and feature-rich API, the kerasR R package ...

1090 sym R (2059 sym/1 pcs)

MLE in R

25.02.2018

When I learned and experimented a new model, I always like to start with its likelihood function in order to gain a better understanding about the statistical nature. That’s why I extensively used the SAS/NLMIXED procedure that gives me more flexibility. Today, I spent a couple hours playing the optim() function and its wrappers, e.g. mle() and...

1127 sym R (1902 sym/1 pcs)