Publications by statcompute
Fit and Visualize A MARS Model
################################################# ## FIT A MULTIVARIATE ADAPTIVE REGRESSION ## ## SPLINES MODEL (MARS) USING MDA PACKAGE ## ## DEVELOPED BY HASTIE AND TIBSHIRANI ## ################################################# # LOAD LIBRARIES AND DATA library(MASS); library(mda); data(Boston); # FIT AN ADDITIVE MARS MODE...
435 sym R (1117 sym/1 pcs) 6 img
Download Stock Price Online with R
library(chron) library(zoo) # STOCK TICKER OF Fifth Third Bancorp stock <- 'FITB' # DEFINE STARTING DATE start.date <- 1 start.month <- 1 start.year <- 2012 # DEFINE ENDING DATE end.date <- 11 end.month <- 10 end.year <- 2012 # DEFINE URL LINK link <- paste("http://ichart.finance.yahoo.com/table.csv?s=", stock, "&a=", as.ch...
435 sym R (1025 sym/1 pcs) 6 img
A Light Touch on RPy2
For a statistical analyst, the first step to start a data analysis project is to import the data into the program and then to screen the descriptive statistics of the data. In python, we can easily do so with pandas package. In [1]: import pandas as pd In [2]: data = pd.read_table("/home/liuwensui/Documents/data/csdata.txt", header = 0) In [3]:...
1072 sym Python (3152 sym/2 pcs) 4 img
Run R Code Within Python On The Fly
Below is an example showing how to run R code within python, which is an extremely attractive feature for hardcore R programmers. In [1]: import rpy2.robjects as ro In [2]: _null_ = ro.r('data <- read.table("/home/liuwensui/data/credit_count.txt", header = TRUE, sep = ",")') In [3]: print ro.r('str(data)') 'data.frame': 13444 obs. of 14 variab...
564 sym R (3623 sym/1 pcs) 4 img
Another Way to Access R from Python – PypeR
Different from RPy2, PypeR provides another simple way to access R from Python through pipes (http://www.jstatsoft.org/v35/c02/paper). This handy feature enables data analysts to do the data munging with python and the statistical analysis with R by passing objects interactively between two computing systems. Below is a simple demonstration on h...
913 sym R (4196 sym/1 pcs) 4 img
Exchange Data between Python and R with SQLite
SQLite is a light-weight database with zero-configuration. Being fast, reliable, and simple, SQLite is a good choice to store / query large data, e.g. terabytes, and is well supported by both Python and R. In [1]: # LOAD PYTHON PACKAGES In [2]: import pandas as pd In [3]: import pandas.io.sql as pd_sql In [4]: import sqlite3 as sql In [5]: i...
641 sym R (3292 sym/1 pcs) 4 img
Fractional Logit Model with Python
In [1]: import pandas as pd In [2]: import statsmodels.api as sm In [3]: data = pd.read_table('/home/liuwensui/Documents/data/csdata.txt') In [4]: Y = data.LEV_LT3 In [5]: X = sm.add_constant(data[['COLLAT1', 'SIZE1', 'PROF2', 'LIQ', 'IND3A']]) In [6]: # Discrete Dependent Variable Models with Logit Link In [7]: mod = sm.Logit(Y, X) In [8]...
434 sym R (4081 sym/1 pcs) 4 img
Generalized Boosted Regression with A Monotonic Marginal Effect for Each Predictor
In the practice of risk modeling, it is sometimes mandatory to maintain a monotonic relationship between the response and each predictor. Below is a demonstration showing how to develop a generalized boosted regression with a monotonic marginal effect for each predictor. ################################################## # FIT A GENERALIZED BOOS...
772 sym R (1632 sym/1 pcs) 8 img
Removing Records by Duplicate Values
Removing records from a data table based on duplicate values in one or more columns is a commonly used but important data cleaning technique. Below shows an example about how to accomplish this task by SAS, R, and Python respectively. SAS Example data _data_; input label $ value; datalines; A 4 B 3 C 6 B 3 B 1 A 2 A ...
709 sym R (1462 sym/3 pcs) 4 img
Removing Records by Duplicate Values in R – An Efficiency Comparison
After posting “Removing Records by Duplicate Values” yesterday, I had an interesting communication thread with my friend Jeffrey Allard tonight regarding how to code this in R, a combination of order() and duplicated() or sqldf(). Afterward, I did a simple efficiency comparison between two methods as below. The comparison result is pretty sel...
878 sym R (862 sym/1 pcs) 4 img