Publications by Allan Engelhardt
R tips: Keep your packages up-to-date
In this entry in a small series of tips for the use of the R statistical analysis and computing tool, we look at how to keep your addon packages up-to-date. One of the great strengths of R is the many packages available. All the new approaches, as well as some of the best implementations of your old favorites are there. But it can also be a l...
2754 sym R (1144 sym/6 pcs) 14 img
R tips: Eliminating the “save workspace image” prompt on exit
When using R, the statistical analysis and computing platform, I find it really annoying that it always prompts to save the workspace when I exit. This is how I turn it off. I wish there was an option to change the default of the q/quit functions. I start and stop R frequently and so the exit question which I have to answer every time is real...
1531 sym R (99 sym/3 pcs) 14 img
R tips: Swapping columns in a matrix
Using R, the statistical analysis and computing platform, swapping two columns in a matrix is really easy: m[ , c(1,2)] <- m[ , c(2,1)]. Note, however, that this does not swap the column names (if you have any) but only the values. You could do something like colnames(m)[c(1,2)] <- colnames(m)[c(2,1)] if you need the names changed as well, bu...
807 sym R (32 sym/1 pcs) 14 img
SNA with R: Loading your network data
We are interested in Social Network Analysis using the statistical analysis and computing platform R. As usual with R, the documentation is pretty bad, so this series collects our notes as we learn more about the available packages and how they work. We use here the statnet group of packages, which seems to be the most comprehensive and most ac...
5604 sym R (2984 sym/6 pcs) 14 img
SNA with R: Loading large networks using the igraph library
We are interested in Social Network Analysis using the statistical analysis and computing platform R. The documentation for R is voluminous but typically not very good, so this entry is part of a series where we document what we learn as we explore the tool and the packages. In our previous post on SNA we gave up on using the statnet package b...
2922 sym R (1917 sym/4 pcs) 16 img
Data.gov
I am always on the lookout for useful data sources for training in statistics, so I am excited that Data.gov has opened for business. The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the US Government. This is a great initiative which I look forward to explore when I am not in a tiny a...
1213 sym 18 img
R tips: Use read.table instead of strsplit to split a text column into multiple columns
Someone on the R-help mailing list had a data frame with a column containing IP addresses in quad-dot format (e.g. 1.10.100.200). He wanted to sort by this column and I proposed a solution involving strsplit. But Peter Dalgaard comes up with a much nicer method using read.table on a textConnection object: > a <- data.frame(cbind(color=c("yell...
775 sym R (483 sym/1 pcs) 14 img
R used by KDD 2009 cup winner of slow challenge
The results from the KDD Cup 2009 challenge (which we wrote about before) are in, and the winner of the slow challenge used the R statistical computing and analysis platform for their winning submission. The write up (username/password may be required) from Hugh Miller and team at the University of Melbourne includes these points: Decision tr...
3133 sym 16 img
How to win the KDD Cup Challenge with R and gbm
Hugh Miller, the team leader of the winner of the KDD Cup 2009 Slow Challenge (which we wrote about recently) kindly provides more information about how to win this public challenge using the R statistical computing and analysis platform on a laptop (!). As a reminder of what we wrote before, the challenge provided two anonymized data set each ...
3359 sym 16 img
Data Mashups in R from O’Reilly
O’Reilly has published Data Mashups in R as a $4.99 PDF download in their Short Cut series. In 27 pages it takes you through an example of how to combine foreclosure information with maps and geographical information to produce plots like the one below. This is all done with the R statistical computing and analysis platform. They show how t...
1040 sym 16 img