Publications by heuristicandrew

SAS: “The query requires remerging summary statistics back with the original data”

22.09.2009

Coming from a background writing SQL code directly for “real” RDBMS (Microsoft SQL Server, MySQL, and SQLite), I was initially confused when SAS would give me the following ‘note’ for a simple summary PROC SQL query: 429 proc sql; 430 create table undel_monthly as 431 select 432 year(date) as year, 433 month(d...

1577 sym R (2167 sym/3 pcs) 12 img

Delete rows from R data frame

08.10.2009

Deleting rows from a data frame in R is easy by combining simple operations. Let’s say you are working with the built-in data set airquality and need to remove rows where the ozona is NA (also called null, blank or missing). The method is a conceptually different than a SQL database that has a dedicated […] Related To leave a co...

731 sym 2 img

“Outlook cannot open this item.” and tasks missing

08.10.2009

Recently Microsoft Office Outlook 2007 started giving me the vague error message Outlook cannot open this item. The item may be damaged. The message would appear randomly throughout the day. Sometimes five error message boxes would be stacked up on top of each other. OK, but which item? What kind of item? Is it an email, appointment, or task?...

1611 sym 12 img

Plot ROC curve and lift chart in R

18.12.2009

This tutorial with real R code demonstrates how to create a predictive model using cforest (Breiman’s random forests) from the package party, evaluate the predictive model on a separate set of data, and then plot the performance using ROC curves and a lift chart. These charts are useful for evaluating model performance in data minin...

762 sym 2 img

Compare performance of machine learning classifiers in R

23.12.2009

This tutorial demonstrates to the R novice how to create five machine learning models for classification and compare the performance graphically with ROC curves in one chart. For a simpler introduction, start with Plot ROC curve and lift chart in R. # load the mlbench package which has the BreastCancer data set require(mlbench) # if [...

761 sym 2 img

Error : .onLoad failed in ‘loadNamespace’ for ‘RWeka’

24.12.2009

After installing Weka/RWeka in R, you may get this error if you try to load RWeka in the same session: require(RWeka) Cannot create Java virtual machine (-4) Error : .onLoad failed in 'loadNamespace' for 'RWeka' Solution: Just close R and re-open it. Cause: Apparently the installation requires some initialization. Tested on R 2.10.1 on Windows...

759 sym 12 img

R: Memory usage statistics by variable

04.01.2010

Do you need a way to find out which individual variables in R consume the most memory? # create dummy variables for demonstration x Related To leave a comment for the author, please follow the link and comment on their blog: Heuristic Andrew » r-project. R-bloggers.com offers daily e-mail updates about R news and tutorials about...

555 sym 1 img

Setting the HTML title tag in SAS ODS (the right way)

05.01.2010

In our department and various places on the Intertubes, SAS programmers set the HTML title tag (which sets the title in web browsers and on search engines) in ODS using the headtext option: ods html headtext="<title>My great report</title>" /* wrong! */ file="foo.html"; This may work in some situations, but it’s ugly and wrong. To see why,...

1370 sym R (471 sym/3 pcs) 12 img

Weighting model fit with ctree in party

15.03.2010

Conditional inference trees (ctree) in package party allows weighting which is useful when one classification outcome is more important than another. Useful examples are not difficult to imagine: in a marketing direct mailing, a false positive (non-response) costs just paper and postage (say, $0.50) while a true positive (response) ma...

787 sym 2 img

Validating credit card numbers in SAS

16.03.2010

Major credit card issuing networks (including Visa, MasterCard, Discover, and American Express) allow simple credit card number validation using the Luhn Algorithm (also called the “modulus 10″ or “mod 10″ algorithm). The following code demonstrates an implementation in SAS. The code also validates the credit card number by length and b...

1090 sym Python (1930 sym/1 pcs) 12 img