Publications by heuristicandrew

Compcache on Ubuntu on Amazon EC2

04.05.2010

The following fully-automatic Bash script downloads, compiles, and initializes compcache version 0.6.2 on Ubuntu Karmic Koala (9.10). This script creates two swaps with a maximum of 4GB uncompressed size each. Two swaps are used to take advantage of 2 CPUs (or CPU cores in a multicore CPU). Compcache is a fascinating memory compression system. ...

1895 sym R (869 sym/1 pcs) 12 img

Text Data Mining with Twitter and R

08.04.2011

Twitter is a favorite source of text data for analysis: it’s popular (there is a huge volume of variety on all topics) and easily accessible using Twitter’s free, open APIs which are easily consumable in JSON and ATOM formats. Some … Continue reading → Related To leave a comment for the author, please follow the link and com...

675 sym 2 img

Benchmarking R, Revolution R, and HyperThreading for data mining

27.06.2011

Usually data mining benchmarks measure lift, precision, etc., but wasting analyst time hurts the ROI of any project. I recently upgraded my notebook (where I often use R for data mining) and was faced with two questions: for the fastest … Continue reading → Related To leave a comment for the author, please follow the link and co...

680 sym 2 img

Two browsers for R help documentation

29.06.2011

The same excellent documentation for R commands is available through two different help browsers: text and HTML, and let’s see how how each looks, works, and how to switch the default. Look and feel Here is how both look for … Continue reading → Related To leave a comment for the author, please follow the link and comment on t...

668 sym 2 img

Basic line chart with ggplot2

27.09.2011

ggplot2 is a package for R which easily draws plots that are easier on the eyes than R’s built-in plotting functions, though the grammar is different than what is commonly used in R. This code demonstrates how to prepare a … Continue reading → Related To leave a comment for the author, please follow the link and comment on the...

666 sym 2 img

Paired sample t-test in R

28.09.2011

Let’s walk through using R and Student’s t-test to compare paired sample data. The book Statistics: The Exploration & Analysis of Data (6th edition, p505) presents the longitudinal study “Bone mass is recovered from lactation to postweaning in adolescent mothers … Continue reading → Related To leave a comment for the autho...

706 sym 2 img

Model decision tree in R, score in Base SAS

11.10.2011

This code creates a decision tree model in R using party::ctree() and prepares the model for export it from R to Base SAS, so SAS can score new records. SAS Enterprise Miner and PMML are not required, and Base SAS … Continue reading → Related To leave a comment for the author, please follow the link and comment on their blog: H...

657 sym 2 img

Confidence interval diagram in R

19.10.2011

This code shows how to easily plot a beautiful confidence interval diagram in R. First, let’s input the raw data. We’ll be making two confidence intervals for two samples of 10. In case you curious, the data represents samples from … Continue reading → Related To leave a comment for the author, please follow the link and com...

675 sym 2 img

Train neural network in R, predict in SAS

11.11.2011

This R code fits an artificial neural network in R and generates Base SAS code, so new records can be scored entirely in Base SAS. This is intended to be a simple, elegant, fast solution. You don’t need SAS Enterprise … Continue reading → Related To leave a comment for the author, please follow the link and comment on their bl...

661 sym 2 img

Using neural network for regression

17.11.2011

Artificial neural networks are commonly thought to be used just for classification because of the relationship to logistic regression: neural networks typically use a logistic activation function and output values from 0 to 1 like logistic regression. However, the worth … Continue reading → Related To leave a comment for the aut...

714 sym 2 img