Publications by heuristicandrew
Compcache on Ubuntu on Amazon EC2
The following fully-automatic Bash script downloads, compiles, and initializes compcache version 0.6.2 on Ubuntu Karmic Koala (9.10). This script creates two swaps with a maximum of 4GB uncompressed size each. Two swaps are used to take advantage of 2 CPUs (or CPU cores in a multicore CPU). Compcache is a fascinating memory compression system. ...
1895 sym R (869 sym/1 pcs) 12 img
Text Data Mining with Twitter and R
Twitter is a favorite source of text data for analysis: it’s popular (there is a huge volume of variety on all topics) and easily accessible using Twitter’s free, open APIs which are easily consumable in JSON and ATOM formats. Some … Continue reading → Related To leave a comment for the author, please follow the link and com...
675 sym 2 img
Benchmarking R, Revolution R, and HyperThreading for data mining
Usually data mining benchmarks measure lift, precision, etc., but wasting analyst time hurts the ROI of any project. I recently upgraded my notebook (where I often use R for data mining) and was faced with two questions: for the fastest … Continue reading → Related To leave a comment for the author, please follow the link and co...
680 sym 2 img
Two browsers for R help documentation
The same excellent documentation for R commands is available through two different help browsers: text and HTML, and let’s see how how each looks, works, and how to switch the default. Look and feel Here is how both look for … Continue reading → Related To leave a comment for the author, please follow the link and comment on t...
668 sym 2 img
Basic line chart with ggplot2
ggplot2 is a package for R which easily draws plots that are easier on the eyes than R’s built-in plotting functions, though the grammar is different than what is commonly used in R. This code demonstrates how to prepare a … Continue reading → Related To leave a comment for the author, please follow the link and comment on the...
666 sym 2 img
Paired sample t-test in R
Let’s walk through using R and Student’s t-test to compare paired sample data. The book Statistics: The Exploration & Analysis of Data (6th edition, p505) presents the longitudinal study “Bone mass is recovered from lactation to postweaning in adolescent mothers … Continue reading → Related To leave a comment for the autho...
706 sym 2 img
Model decision tree in R, score in Base SAS
This code creates a decision tree model in R using party::ctree() and prepares the model for export it from R to Base SAS, so SAS can score new records. SAS Enterprise Miner and PMML are not required, and Base SAS … Continue reading → Related To leave a comment for the author, please follow the link and comment on their blog: H...
657 sym 2 img
Confidence interval diagram in R
This code shows how to easily plot a beautiful confidence interval diagram in R. First, let’s input the raw data. We’ll be making two confidence intervals for two samples of 10. In case you curious, the data represents samples from … Continue reading → Related To leave a comment for the author, please follow the link and com...
675 sym 2 img
Train neural network in R, predict in SAS
This R code fits an artificial neural network in R and generates Base SAS code, so new records can be scored entirely in Base SAS. This is intended to be a simple, elegant, fast solution. You don’t need SAS Enterprise … Continue reading → Related To leave a comment for the author, please follow the link and comment on their bl...
661 sym 2 img
Using neural network for regression
Artificial neural networks are commonly thought to be used just for classification because of the relationship to logistic regression: neural networks typically use a logistic activation function and output values from 0 to 1 like logistic regression. However, the worth … Continue reading → Related To leave a comment for the aut...
714 sym 2 img