Publications by statcompute
General Regression Neural Network with R
Similar to the back propagation neural network, the general regression neural network (GRNN) is also a good tool for the function approximation in the modeling toolbox. Proposed by Specht in 1991, GRNN has advantages of instant training and easy tuning. A GRNN would be formed instantly with just a 1-pass training with the development data. In the...
1959 sym R (1465 sym/1 pcs) 8 img
Prototyping A General Regression Neural Network with SAS
Last time when I read the paper “A General Regression Neural Network” by Donald Specht, it was exactly 10 years ago when I was in the graduate school. After reading again this week, I decided to code it out with SAS macros and make this excellent idea available for the SAS community. The prototype of GRNN consists of 2 SAS macros, %grnn_lear...
2102 sym R (6932 sym/3 pcs) 4 img
GRNN and PNN
From the technical prospective, people usually would choose GRNN (general regression neural network) to do the function approximation for the continuous response variable and use PNN (probabilistic neural network) for pattern recognition / classification problems with categorical outcomes. However, from the practical standpoint, it is often not n...
1822 sym R (1248 sym/2 pcs) 4 img
Prototyping Multinomial Logit with R
Recently, I am working on a new modeling proposal based on the competing risk and need to prototype multinomial logit models with R. There are R packages implementing multinomial logit models that I’ve tested, namely nnet and vgam. Model outputs with iris data are shown below. data(iris) ### method 1: nnet package ### library(nnet) mdl1 <- mu...
1143 sym R (1473 sym/2 pcs) 4 img
Generate and Retrieve Many Objects with Sequential Names
While coding ensemble methods in data mining with R, e.g. bagging, we often need to generate many data and models objects with sequential names. Below is a quick example how to use assign() function to generate many prediction objects on the fly and then retrieve these predictions with mget() to do the model averaging. data(Boston, package = "MA...
756 sym R (701 sym/1 pcs) 4 img
rPython – R Interface to Python
> library(rPython) Loading required package: RJSONIO > ### load r data.frame ### > data(iris) > r_df1 <- iris > class(r_df1) [1] "data.frame" > head(r_df1, n = 3) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 ...
434 sym R (1899 sym/1 pcs) 4 img
Simplex Model in R
R CODE library(simplexreg) library(foreign) ### http://fmwww.bc.edu/repec/bocode/k/k401.dta ### data <- read.dta("/home/liuwensui/Documents/data/k401.dta") mdl <- simplexreg(prate ~ mrate + totemp + age + sole|mrate + totemp + age + sole, type = "hetero", link = "logit", data = data, subset = prate < 1) summary(mdl) R OUTPUT simplexreg(formu...
484 sym R (2861 sym/3 pcs) 4 img
Julia and SQLite
Similar to R and Pandas in Python, Julia provides a simple yet efficient interface with SQLite database. In addition, it is extremely handy to use sqldf() function, which is almost identical to the sqldf package in R, in SQLite package for data munging. julia> # LOADING SQLITE PACKAGE julia> using SQLite julia> # CONNECT TO THE SQLITE DB FILE ...
688 sym Python (1538 sym/1 pcs) 4 img
Efficiency of Importing Large CSV Files in R
### size of csv file: 689.4MB (7,009,728 rows * 29 columns) ### system.time(read.csv('../data/2008.csv', header = T)) # user system elapsed # 88.301 2.416 90.716 library(data.table) system.time(fread('../data/2008.csv', header = T, sep = ',')) # user system elapsed # 4.740 0.048 4.785 library(bigmemory) system.time(read.big.ma...
433 sym R (681 sym/1 pcs) 4 img
Chain Operations: An Interesting Feature in dplyr Package
library(data.table) library(dplyr) data1 <- fread('/home/liuwensui/Downloads/2008.csv', header = T, sep = ',') dim(data1) # [1] 7009728 29 data2 <- data1 %.% filter(Year = 2008, Month %in% c(1, 2, 3, 4, 5, 6)) %.% select(Year, Month, AirTime) %.% group_by(Year, Month) %.% summarize(avg_time = mea...
433 sym R (597 sym/1 pcs) 4 img