Publications by statcompute
Clojure Integration with R
(require '[tnoda.rashinban :as rr] '[tnoda.rashinban.core :as rc] '[clojure.core.matrix.dataset :as dt] '[clojure.core.matrix.impl.dataset :as id]) ;; CREATE A TOY DATA (def ds [{:id 1.0 :name "name1"} {:id 2.0 :name "name2"} {:id 3.0 :name "name3"}]) ;; RUN THE FOLLOWING R CODE IN ADVANCE TO START T...
431 sym R (1656 sym/1 pcs)
LogRatio Regression – A Simple Way to Model Compositional Data
The compositional data are proportionals of mutually exclusive groups that would be summed up to the unity. Statistical models for compositional data have been applicable in a number of areas, e.g. the product or channel mix in the marketing research and asset allocations of a investment portfolio. In the example below, I will show how to model ...
1247 sym R (1407 sym/3 pcs)
Read Random Rows from A Huge CSV File
Given R data frames stored in the memory, sometimes it is beneficial to sample and examine the data in a large-size csv file before importing into the data frame. To the best of my knowledge, there is no off-shelf R function performing such data sampling with a relatively low computing cost. Therefore, I drafted two utility functions serving this...
1084 sym R (1495 sym/1 pcs)
MLE with General Optimization Functions in R
In my previous post (https://statcompute.wordpress.com/2018/02/25/mle-in-r/), it is shown how to estimate the MLE based on the log likelihood function with the general-purpose optimization algorithm, e.g. optim(), and that the optimizer is more flexible and efficient than wrappers in statistical packages. A benchmark comparison are given below s...
2098 sym R (1573 sym/1 pcs)
Mimicking SQLDF with MonetDBLite
Like many useRs, I am also a big fan of the sqldf package developed by Grothendieck, which uses SQL statement for data frame manipulations with SQLite embedded database as the default back-end. In examples below, I drafted a couple R utility functions with the MonetDBLite back-end by mimicking the sqldf function. There are several interesting ob...
1267 sym R (4305 sym/1 pcs)
Co-integration and Pairs Trading
The co-integration is an important statistical concept behind the statistical arbitrage strategy named “Pairs Trading”. While projecting a stock price with time series models is by all means difficult, it is technically feasible to find a pair of (or even a portfolio of) stocks sharing the common trend such that a linear combination of two se...
1860 sym R (2080 sym/1 pcs) 4 img
Ordered Probit Model and Price Movements of High-Frequency Trades
The analysis of high frequency stock transactions has played an important role in the algorithmic trading and the result can be used to monitor stock movements and to develop trading strategies. In the paper “An Ordered Probit Analysis of Transaction Stock Prices” (1992), Hausman, Lo, and MacKinlay discussed estimating trade-by-trade stock pr...
1696 sym R (2869 sym/2 pcs) 4 img
Adjacent-Categories and Continuation-Ratio Logit Models for Ordinal Outcomes
In the previous post (https://statcompute.wordpress.com/2018/01/28/modeling-lgd-with-proportional-odds-model), I’ve shown how to estimate a standard Cumulative Logit model with the ordinal::clm function and its use case in credit risk models. To better a better illustration of the underlying logic, an example is also provided below, showing ...
1964 sym R (3231 sym/4 pcs)
More Flexible Ordinal Outcome Models
In the previous post (https://statcompute.wordpress.com/2018/08/26/adjacent-categories-and-continuation-ratio-logit-models-for-ordinal-outcomes), we’ve shown alternative models for ordinal outcomes in addition to commonly used Cumulative Logit models under the proportional odds assumption, which are also known as Proportional Odds model. A pote...
3426 sym R (8357 sym/4 pcs)
Playing Map() and Reduce() in R – By-Group Calculation
Clojure is such an interesting programming language that it can not only enhance our skill set but also change the way how we should write the program. After learning Clojure, I can’t help thinking about how to employ the functional programming and MapReduce paradigm to improve our experience with other programming languages, e.g. R in my case....
4128 sym R (1129 sym/4 pcs)