Publications by statcompute

Fit and Visualize A MARS Model

07.10.2012

################################################# ## FIT A MULTIVARIATE ADAPTIVE REGRESSION ## ## SPLINES MODEL (MARS) USING MDA PACKAGE ## ## DEVELOPED BY HASTIE AND TIBSHIRANI ## ################################################# # LOAD LIBRARIES AND DATA library(MASS); library(mda); data(Boston); # FIT AN ADDITIVE MARS MODE...

435 sym R (1117 sym/1 pcs) 6 img

Download Stock Price Online with R

11.10.2012

library(chron) library(zoo) # STOCK TICKER OF Fifth Third Bancorp stock <- 'FITB' # DEFINE STARTING DATE start.date <- 1 start.month <- 1 start.year <- 2012 # DEFINE ENDING DATE end.date <- 11 end.month <- 10 end.year <- 2012 # DEFINE URL LINK link <- paste("http://ichart.finance.yahoo.com/table.csv?s=", stock, "&a=", as.ch...

435 sym R (1025 sym/1 pcs) 6 img

A Light Touch on RPy2

23.11.2012

For a statistical analyst, the first step to start a data analysis project is to import the data into the program and then to screen the descriptive statistics of the data. In python, we can easily do so with pandas package. In [1]: import pandas as pd In [2]: data = pd.read_table("/home/liuwensui/Documents/data/csdata.txt", header = 0) In [3]:...

1072 sym Python (3152 sym/2 pcs) 4 img

Run R Code Within Python On The Fly

24.11.2012

Below is an example showing how to run R code within python, which is an extremely attractive feature for hardcore R programmers. In [1]: import rpy2.robjects as ro In [2]: _null_ = ro.r('data <- read.table("/home/liuwensui/data/credit_count.txt", header = TRUE, sep = ",")') In [3]: print ro.r('str(data)') 'data.frame': 13444 obs. of 14 variab...

564 sym R (3623 sym/1 pcs) 4 img

Another Way to Access R from Python – PypeR

29.11.2012

Different from RPy2, PypeR provides another simple way to access R from Python through pipes (http://www.jstatsoft.org/v35/c02/paper). This handy feature enables data analysts to do the data munging with python and the statistical analysis with R by passing objects interactively between two computing systems. Below is a simple demonstration on h...

913 sym R (4196 sym/1 pcs) 4 img

Exchange Data between Python and R with SQLite

02.12.2012

SQLite is a light-weight database with zero-configuration. Being fast, reliable, and simple, SQLite is a good choice to store / query large data, e.g. terabytes, and is well supported by both Python and R. In [1]: # LOAD PYTHON PACKAGES In [2]: import pandas as pd In [3]: import pandas.io.sql as pd_sql In [4]: import sqlite3 as sql In [5]: i...

641 sym R (3292 sym/1 pcs) 4 img

Fractional Logit Model with Python

16.12.2012

In [1]: import pandas as pd In [2]: import statsmodels.api as sm In [3]: data = pd.read_table('/home/liuwensui/Documents/data/csdata.txt') In [4]: Y = data.LEV_LT3 In [5]: X = sm.add_constant(data[['COLLAT1', 'SIZE1', 'PROF2', 'LIQ', 'IND3A']]) In [6]: # Discrete Dependent Variable Models with Logit Link In [7]: mod = sm.Logit(Y, X) In [8]...

434 sym R (4081 sym/1 pcs) 4 img

Generalized Boosted Regression with A Monotonic Marginal Effect for Each Predictor

18.12.2012

In the practice of risk modeling, it is sometimes mandatory to maintain a monotonic relationship between the response and each predictor. Below is a demonstration showing how to develop a generalized boosted regression with a monotonic marginal effect for each predictor. ################################################## # FIT A GENERALIZED BOOS...

772 sym R (1632 sym/1 pcs) 8 img

Removing Records by Duplicate Values

20.12.2012

Removing records from a data table based on duplicate values in one or more columns is a commonly used but important data cleaning technique. Below shows an example about how to accomplish this task by SAS, R, and Python respectively. SAS Example data _data_; input label $ value; datalines; A 4 B 3 C 6 B 3 B 1 A 2 A ...

709 sym R (1462 sym/3 pcs) 4 img

Removing Records by Duplicate Values in R – An Efficiency Comparison

20.12.2012

After posting “Removing Records by Duplicate Values” yesterday, I had an interesting communication thread with my friend Jeffrey Allard tonight regarding how to code this in R, a combination of order() and duplicated() or sqldf(). Afterward, I did a simple efficiency comparison between two methods as below. The comparison result is pretty sel...

878 sym R (862 sym/1 pcs) 4 img