Publications by statcompute

Modeling Severity in Operational Losses with Python

06.12.2015

When modeling severity measurements in the operational loss with Generalized Linear Models, we might have a couple choices based on different distributional assumptions, including Gamma, Inverse Gaussian, and Lognormal. However, based on my observations from the empirical work, the differences in parameter estimates among these three popular cand...

1055 sym Python (7773 sym/1 pcs) 4 img

Modeling Frequency in Operational Losses with Python

08.12.2015

Poisson and Negative Binomial regressions are two popular approaches to model frequency measures in the operational loss and can be implemented in Python with the statsmodels package as below: In [1]: import pandas as pd In [2]: import statsmodels.api as sm In [3]: import statsmodels.formula.api as smf In [4]: df = pd.read_csv("AutoCollision.c...

1156 sym Python (6428 sym/2 pcs) 4 img

Fitting Generalized Regression Neural Network with Python

09.12.2015

In [1]: # LOAD PACKAGES In [2]: import pandas as pd In [3]: import numpy as np In [4]: from sklearn import preprocessing as pp In [5]: from sklearn import cross_validation as cv In [6]: from neupy.algorithms import GRNN as grnn In [7]: from neupy.functions import mse In [8]: # DATA PROCESSING In [9]: df = pd.read_table("csdata.txt") In [...

433 sym Python (1772 sym/1 pcs) 4 img

Calculate Leave-One-Out Prediction for GLM

13.12.2015

In the model development, the “leave-one-out” prediction is a way of cross-validation, calculated as below: 1. First of all, after a model is developed, each observation used in the model development is removed in turn and then the model is refitted with the remaining observations 2. The out-of-sample prediction for the refitted model is calc...

1112 sym R (1472 sym/1 pcs) 4 img

Prediction Intervals for Poisson Regression

20.12.2015

Different from the confidence interval that is to address the uncertainty related to the conditional mean, the prediction interval is to accommodate the additional uncertainty associated with prediction errors. As a result, the prediction interval is always wider than the confidence interval in a regression model. In the context of risk modeling,...

3289 sym R (2020 sym/3 pcs) 4 img

The Power of Decision Stumps

01.01.2016

A decision stump is the weak classification model with the simple tree structure consisting of one split, which can also be considered a one-level decision tree. Due to its simplicity, the stump often demonstrates a low predictive performance. As shown in the example below, the AUC measure of a stump is even lower than the one of a single attribu...

1301 sym R (1666 sym/2 pcs) 4 img

Where Bagging Might Work Better Than Boosting

02.01.2016

In the previous post (https://statcompute.wordpress.com/2016/01/01/the-power-of-decision-stumps), it was shown that the boosting algorithm performs extremely well even with a simple 1-level stump as the base learner and provides a better performance lift than the bagging algorithm does. However, this observation shouldn’t be generalized, which ...

1698 sym R (1221 sym/3 pcs) 4 img

Improve SVM Tuning through Parallelism

19.03.2016

As pointed out in the chapter 10 of “The Elements of Statistical Learning”, ANN and SVM (support vector machines) share similar pros and cons, e.g. lack of interpretability and good predictive power. However, in contrast to ANN usually suffering from local minima solutions, SVM is always able to converge globally. In addition, SVM is less pro...

1201 sym R (1323 sym/1 pcs) 4 img

More Flexible Approaches to Model Frequency

12.05.2016

(The post below is motivated by my friend Matt Flynn https://www.linkedin.com/in/matthew-flynn-1b443b11) In the context of operational loss forecast models, the standard Poisson regression is the most popular way to model frequency measures. Conceptually speaking, there is a restrictive assumption for the standard Poisson regression, namely Equi-...

3262 sym R (2424 sym/3 pcs) 4 img

Risk Models with Generalized PLS

12.06.2016

While developing risk models with hundreds of potential variables, we often run into the situation that risk characteristics or macro-economic indicators are highly correlated, namely multicollinearity. In such cases, we might have to drop variables with high VIFs or employ “variable shrinkage” methods, e.g. lasso or ridge, to suppress variab...

1361 sym R (2706 sym/2 pcs) 4 img