Publications by Abhijit
Workflow with Python and R
I seem to be doing more and more with Python for work over and above using it as a generic scripting language. R has been my workhorse for analysis for a long time (15+ years in various incarnations of S+ and R), but it still has some deficiencies. I’m finding Python easier and faster to work with for large data sets. I’m also a bit happier w...
3496 sym 16 img
R amusements
On a lark, and to kill a bit of time, I was running the R fortune command looking for references to SAS. Here’s what two successive random fortunes turned up. Can there be two more antipodal opinions about the same product? I laughed out loud. > fortune(‘SAS’) There are companies whose yearly license fees to SAS total millions of dollars. ...
1404 sym 16 img
Quick and dirty parallel processing in R
R has some powerful tools for parallel processing, which I discovered while searching for ways to fully utilize my 8-core computer at work. What surprised me is how easy it is…about 6 lines of code, if that. Given that I wasn’t allowed to install heavy duty parallel-processing systems like MPICH on the computer, I found that the library SNOW ...
1860 sym 16 img
A small customization of ESS
JD Long (at Cerebral Mastication) posted a question on Twitter about an artifact in ESS, where typing “_” gets you “ Type “_” twice, which puts in the underscore Use “C-q _”, i.e. Ctrl-q then underscore Put (setq ess-S-assign "_") in your .emacs file The last fix obviously customizes ESS permanently for your emacs setup, while the...
1359 sym 18 img
useR! 2010 done and dusted
The useR! 2010 R users conference just finished up this afternoon with a thought-provoking, controversial, and sometimes hilarious talk by Richard Stallman of GNU fame. It started on Tuesday with great tutorials (I took ones on MICE for multiple imputation and Frank Harrell’s excellent regression modeling). In between these bookends was a wonde...
2689 sym 16 img
ggplot2 joy
I’ve been working on a long-term (25+yr) longitudinal study of rheumatoid arthritis with my boss. He just walked in and asked if I could create a plot showing the trajectory of pain scores over time for each subject, separated by educational level (4 groups). Having now worked with ggplot2 for a while, and learning more at the last two DC useR ...
1481 sym 6 img
The split-apply-combine paradigm in R
Last night at the DC R Users meetup, which was our largest meetup to date, I gave an introductory presentation on data munging, and spent a bit of time on the split-apply-combine paradigm that I use almost daily in my work. I talked mainly about the packages plyr and doBy, which I use a lot now. David Smith posted a link on the Revolution blog to...
1133 sym 4 img
RStudio: a cut above
As most followers of R-bloggers.com and the Twitter #rstats know by now, RStudio is a new open-source IDE for R that was beta-released yesterday. I have started putting it through its paces within my R workflow, and my impressions are more than favorable. I also tried it out on my home Linux server in server mode. RStudio is obviously designed by...
2135 sym 4 img
An enhanced Kaplan-Meier plot
We often see, in publications, a Kaplan-Meier survival plot, with a table of the number of subjects at risk at different time points aligned below the figure. I needed this type of plot (or really, matrices of such plots) for an upcoming publication. Of course, my preferred toolbox was R and the ggplot2 package. There were other attempts to do th...
4207 sym R (5282 sym/2 pcs) 4 img
SAS, R and categorical variables
One of the disappointing problems in SAS (as I need PROC MIXED for some analysis) is to recode categorical variables to have a particular reference category. In R, my usual tool, this is rather easy both to set and to modify using the relevel command available in base R (in the stats package). My understanding is that this is actually easy in S...
1289 sym 4 img