Publications by statcompute

Ensemble Learning with Cubist Model

20.03.2015

The tree-based Cubist model can be easily used to develop an ensemble classifier with a scheme called “committees”. The concept of “committees” is similar to the one of “boosting” by developing a series of trees sequentially with adjusted weights. However, the final prediction is the simple average of predictions from all “committee...

1114 sym R (1668 sym/1 pcs) 4 img

Autoregressive Conditional Poisson Model – I

29.03.2015

Modeling the time series of count outcome is of interest in the operational risk while forecasting the frequency of losses. Below is an example showing how to estimate a simple ACP(1, 1) model, e.g. Autoregressive Conditional Poisson, without covariates with ACP package. library(acp) ### acp(1, 1) without covariates ### mdl <- acp(y ~ -1, data ...

707 sym R (791 sym/1 pcs) 6 img

rPithon vs. rPython

30.03.2015

Similar to rPython, the rPithon package (http://rpithon.r-forge.r-project.org) allows users to execute Python code from R and exchange the data between Python and R. However, the underlying mechanisms between these two packages are fundamentally different. Wihle rPithon communicates with Python from R through pipes, rPython accomplishes the same ...

1038 sym R (892 sym/2 pcs) 4 img

Modeling Count Time Series with tscount Package

31.03.2015

The example below shows how to estimate a simple univariate Poisson time series model with the tscount package. While the model estimation is straightforward and yeilds very similar parameter estimates to the ones generated with the acp package (https://statcompute.wordpress.com/2015/03/29/autoregressive-conditional-poisson-model-i), the predicti...

1062 sym R (1198 sym/1 pcs) 4 img

To Difference or Not To Difference?

09.05.2015

In the textbook of time series analysis, we’ve been taught to difference the time series in order to have a stationary series, which can be justified by various plots and statistical tests. In the real-world time series analysis, things are not always as clear as shown in the textbook. For instance, although the ACF plot shows a not-so-slow dec...

2221 sym R (2181 sym/2 pcs) 8 img

Read A Block of Spreadsheet with R

10.05.2015

In R, there are two ways to read a block of the spreadsheet, e.g. xlsx file, as the one shown below. The xlsx package provides the most intuitive interface with readColumns() function by explicitly defining the starting and the ending columns and rows. library(xlsx) file <- loadWorkbook("C:\Documents and Settings\Administrator\Desktop\test.xlsx...

1020 sym R (495 sym/2 pcs) 6 img

Granger Causality Test

25.05.2015

# READ QUARTERLY DATA FROM CSV library(zoo) ts1 <- read.zoo('Documents/data/macros.csv', header = T, sep = ",", FUN = as.yearqtr) # CONVERT THE DATA TO STATIONARY TIME SERIES ts1$hpi_rate <- log(ts1$hpi / lag(ts1$hpi)) ts1$unemp_rate <- log(ts1$unemp / lag(ts1$unemp)) ts2 <- ts1[1:nrow(ts1) - 1, c(3, 4)] # METHOD 1: LMTEST PACKAGE library(lmtes...

433 sym R (1147 sym/1 pcs) 4 img

Are These Losses from The Same Distribution?

14.06.2015

In Advanced Measurement Approaches (AMA) for Operational Risk models, the bank needs to segment operational losses into homogeneous segments known as “Unit of Measures (UoM)”, which are often defined by the combination of lines of business (LOB) and Basel II event types. However, how do we support whether the losses in one UoM are statistical...

1519 sym R (1634 sym/2 pcs) 4 img

Some Considerations of Modeling Severity in Operational Losses

16.08.2015

In the Loss Distributional Approach (LDA) for Operational Risk models, multiple distributions, including Log Normal, Gamma, Burr, Pareto, and so on, can be considered candidates for the distribution of severity measures. However, the challenge remains in the stress testing exercise, e.g. CCAR, to relate operational losses to macro-economic scenar...

2806 sym R (1542 sym/2 pcs) 4 img

Estimating Quasi-Poisson Regression with GLIMMIX in SAS

14.10.2015

When modeling the frequency measure in the operational risk with regressions, most modelers often prefer Poisson or Negative Binomial regressions as best practices in the industry. However, as an alternative approach, Quasi-Poisson regression provides a more flexible model estimation routine with at least two benefits. First of all, Quasi-Poisson...

1723 sym R (2221 sym/2 pcs) 4 img