Publications by statcompute
Ensemble Learning with Cubist Model
The tree-based Cubist model can be easily used to develop an ensemble classifier with a scheme called “committees”. The concept of “committees” is similar to the one of “boosting” by developing a series of trees sequentially with adjusted weights. However, the final prediction is the simple average of predictions from all “committee...
1114 sym R (1668 sym/1 pcs) 4 img
Autoregressive Conditional Poisson Model – I
Modeling the time series of count outcome is of interest in the operational risk while forecasting the frequency of losses. Below is an example showing how to estimate a simple ACP(1, 1) model, e.g. Autoregressive Conditional Poisson, without covariates with ACP package. library(acp) ### acp(1, 1) without covariates ### mdl <- acp(y ~ -1, data ...
707 sym R (791 sym/1 pcs) 6 img
rPithon vs. rPython
Similar to rPython, the rPithon package (http://rpithon.r-forge.r-project.org) allows users to execute Python code from R and exchange the data between Python and R. However, the underlying mechanisms between these two packages are fundamentally different. Wihle rPithon communicates with Python from R through pipes, rPython accomplishes the same ...
1038 sym R (892 sym/2 pcs) 4 img
Modeling Count Time Series with tscount Package
The example below shows how to estimate a simple univariate Poisson time series model with the tscount package. While the model estimation is straightforward and yeilds very similar parameter estimates to the ones generated with the acp package (https://statcompute.wordpress.com/2015/03/29/autoregressive-conditional-poisson-model-i), the predicti...
1062 sym R (1198 sym/1 pcs) 4 img
To Difference or Not To Difference?
In the textbook of time series analysis, we’ve been taught to difference the time series in order to have a stationary series, which can be justified by various plots and statistical tests. In the real-world time series analysis, things are not always as clear as shown in the textbook. For instance, although the ACF plot shows a not-so-slow dec...
2221 sym R (2181 sym/2 pcs) 8 img
Read A Block of Spreadsheet with R
In R, there are two ways to read a block of the spreadsheet, e.g. xlsx file, as the one shown below. The xlsx package provides the most intuitive interface with readColumns() function by explicitly defining the starting and the ending columns and rows. library(xlsx) file <- loadWorkbook("C:\Documents and Settings\Administrator\Desktop\test.xlsx...
1020 sym R (495 sym/2 pcs) 6 img
Granger Causality Test
# READ QUARTERLY DATA FROM CSV library(zoo) ts1 <- read.zoo('Documents/data/macros.csv', header = T, sep = ",", FUN = as.yearqtr) # CONVERT THE DATA TO STATIONARY TIME SERIES ts1$hpi_rate <- log(ts1$hpi / lag(ts1$hpi)) ts1$unemp_rate <- log(ts1$unemp / lag(ts1$unemp)) ts2 <- ts1[1:nrow(ts1) - 1, c(3, 4)] # METHOD 1: LMTEST PACKAGE library(lmtes...
433 sym R (1147 sym/1 pcs) 4 img
Are These Losses from The Same Distribution?
In Advanced Measurement Approaches (AMA) for Operational Risk models, the bank needs to segment operational losses into homogeneous segments known as “Unit of Measures (UoM)”, which are often defined by the combination of lines of business (LOB) and Basel II event types. However, how do we support whether the losses in one UoM are statistical...
1519 sym R (1634 sym/2 pcs) 4 img
Some Considerations of Modeling Severity in Operational Losses
In the Loss Distributional Approach (LDA) for Operational Risk models, multiple distributions, including Log Normal, Gamma, Burr, Pareto, and so on, can be considered candidates for the distribution of severity measures. However, the challenge remains in the stress testing exercise, e.g. CCAR, to relate operational losses to macro-economic scenar...
2806 sym R (1542 sym/2 pcs) 4 img
Estimating Quasi-Poisson Regression with GLIMMIX in SAS
When modeling the frequency measure in the operational risk with regressions, most modelers often prefer Poisson or Negative Binomial regressions as best practices in the industry. However, as an alternative approach, Quasi-Poisson regression provides a more flexible model estimation routine with at least two benefits. First of all, Quasi-Poisson...
1723 sym R (2221 sym/2 pcs) 4 img