Publications by statcompute

General Regression Neural Network with R

16.06.2013

Similar to the back propagation neural network, the general regression neural network (GRNN) is also a good tool for the function approximation in the modeling toolbox. Proposed by Specht in 1991, GRNN has advantages of instant training and easy tuning. A GRNN would be formed instantly with just a 1-pass training with the development data. In the...

1959 sym R (1465 sym/1 pcs) 8 img

Prototyping A General Regression Neural Network with SAS

22.06.2013

Last time when I read the paper “A General Regression Neural Network” by Donald Specht, it was exactly 10 years ago when I was in the graduate school. After reading again this week, I decided to code it out with SAS macros and make this excellent idea available for the SAS community. The prototype of GRNN consists of 2 SAS macros, %grnn_lear...

2102 sym R (6932 sym/3 pcs) 4 img

GRNN and PNN

23.06.2013

From the technical prospective, people usually would choose GRNN (general regression neural network) to do the function approximation for the continuous response variable and use PNN (probabilistic neural network) for pattern recognition / classification problems with categorical outcomes. However, from the practical standpoint, it is often not n...

1822 sym R (1248 sym/2 pcs) 4 img

Prototyping Multinomial Logit with R

21.08.2013

Recently, I am working on a new modeling proposal based on the competing risk and need to prototype multinomial logit models with R. There are R packages implementing multinomial logit models that I’ve tested, namely nnet and vgam. Model outputs with iris data are shown below. data(iris) ### method 1: nnet package ### library(nnet) mdl1 <- mu...

1143 sym R (1473 sym/2 pcs) 4 img

Generate and Retrieve Many Objects with Sequential Names

08.09.2013

While coding ensemble methods in data mining with R, e.g. bagging, we often need to generate many data and models objects with sequential names. Below is a quick example how to use assign() function to generate many prediction objects on the fly and then retrieve these predictions with mget() to do the model averaging. data(Boston, package = "MA...

756 sym R (701 sym/1 pcs) 4 img

rPython – R Interface to Python

13.10.2013

> library(rPython) Loading required package: RJSONIO > ### load r data.frame ### > data(iris) > r_df1 <- iris > class(r_df1) [1] "data.frame" > head(r_df1, n = 3) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 ...

434 sym R (1899 sym/1 pcs) 4 img

Simplex Model in R

02.02.2014

R CODE library(simplexreg) library(foreign) ### http://fmwww.bc.edu/repec/bocode/k/k401.dta ### data <- read.dta("/home/liuwensui/Documents/data/k401.dta") mdl <- simplexreg(prate ~ mrate + totemp + age + sole|mrate + totemp + age + sole, type = "hetero", link = "logit", data = data, subset = prate < 1) summary(mdl) R OUTPUT simplexreg(formu...

484 sym R (2861 sym/3 pcs) 4 img

Julia and SQLite

08.02.2014

Similar to R and Pandas in Python, Julia provides a simple yet efficient interface with SQLite database. In addition, it is extremely handy to use sqldf() function, which is almost identical to the sqldf package in R, in SQLite package for data munging. julia> # LOADING SQLITE PACKAGE julia> using SQLite julia> # CONNECT TO THE SQLITE DB FILE ...

688 sym Python (1538 sym/1 pcs) 4 img

Efficiency of Importing Large CSV Files in R

10.02.2014

### size of csv file: 689.4MB (7,009,728 rows * 29 columns) ### system.time(read.csv('../data/2008.csv', header = T)) # user system elapsed # 88.301 2.416 90.716 library(data.table) system.time(fread('../data/2008.csv', header = T, sep = ',')) # user system elapsed # 4.740 0.048 4.785 library(bigmemory) system.time(read.big.ma...

433 sym R (681 sym/1 pcs) 4 img

Chain Operations: An Interesting Feature in dplyr Package

28.07.2014

library(data.table) library(dplyr) data1 <- fread('/home/liuwensui/Downloads/2008.csv', header = T, sep = ',') dim(data1) # [1] 7009728 29 data2 <- data1 %.% filter(Year = 2008, Month %in% c(1, 2, 3, 4, 5, 6)) %.% select(Year, Month, AirTime) %.% group_by(Year, Month) %.% summarize(avg_time = mea...

433 sym R (597 sym/1 pcs) 4 img