Publications by wrathematics

Autoplot: Graphical Methods with ggplot2

11.06.2012

Background As of ggplot2 0.9.0 released in March 2012, there is a new generic function autoplot.  This uses R’s S3 methods (which is essentially oop for babies) to let you have some simple overloading of functions.  I’m not going to get deep into oop, because honestly we don’t need to. The idea is very simple.  If I say “I’m sending ...

5030 sym R (7220 sym/15 pcs) 6 img

Some Quirks of the R Language

14.08.2012

R is my favorite programming language.  It’s just so useful for getting work done.  Sometimes people will complain that R is a difficult language.  To me, this begs the questions:  difficult for what?  And for whom?  I personally think R is just about the easiest thing in the world for prototyping.  Meaning if you want to quickly crank o...

5754 sym R (810 sym/7 pcs)

R at 12,000 Cores

16.10.2012

I am very happy to introduce a new set of packages that has just hit the CRAN. We are calling it the Programming with Big Data in R Project, or pbdR for short (or as I like to jokingly refer to it, ‘pretty bad for dyslexics’). You can find out more about the pbdR project at http://r-pbd.org/ The packages are a natural programming framework t...

6644 sym

pbdR Updates – Distributed lm.fit() and More

03.12.2012

Over the weekend, we updated all of the pbdR packages currently available on the CRAN.  The updates include tons of internal housecleaning as well as many new features. Notably, pbdBASE_0.1-1 and pbdDMAT_0.1-1 were released, which contain lm.fit() methods.  This function in particular has been available at my github for over a month, but didn�...

6077 sym

Intentionally Writing Obtuse Code

09.12.2013

Sometimes intentionally writing bad code can be a lot of fun. Now here, when I say “bad”, I mean something that’s functional but completely incoherent to anything but the machine. There are even competitions for this kind of thing, but I only consider myself a dabbler in this dark art. Thankfully, it’s often pretty easy to make obtuse cod...

3417 sym R (1816 sym/6 pcs)

Rules for Naming Objects in R

16.12.2013

Naming Rules in R How are objects allowed to be named in R? As it turns out, this is a very different question from how should objects be named. This isn’t about style conventions, camelCase, dots.verus_underscores, or anything like that; this is about what is strictly possible. I do a lot of outreach to HPC people who are starting to get an in...

4504 sym R (201 sym/8 pcs)

How to Make a Bad Password with R

24.02.2014

I have a lot of projects that will take ages to finish (some are in such poor shape that I tuck them away in private repositories, so no one can see my shame).  So sometimes it’s nice to just take a weekend and crank out something start to finish, even if it’s dumb and no one cares about it and fewer people want it.  Which brings us to the ...

4520 sym R (306 sym/2 pcs) 2 img

Searching an R Function’s Source Code

01.05.2014

This is not nearly as interesting as it might first sound, but every function in R contains R code; this is true of core R code as well as extension packages. Sometimes the R code is just a very shallow wrapper around some compiled code, such as in sum() and is.null(). Other times, as in lm.fit(), there is a vast expanse of R code. It’s easy ...

1704 sym R (4115 sym/6 pcs)

Modern Applied Statistics in R’lyeh

30.06.2014

So you’ve probably heard of King James Programming; if not, you should check it out because it’s great. A quick summary is that someone took the King James Bible and Sussman’s Structure and Interpretation of Computer Programs (SICP) and used an n-gram babbler to generate new sentences that combine the texts in amusing ways. The generator it...

3831 sym R (63 sym/1 pcs) 2 img

Advanced R Profiling with pbdPAPI

22.07.2014

R has some extremely useful utilities for profiling, such as system.time(), Rprof(), the often overlooked tracemem(), and the rbenchmark package. But if you want more than just simple timings of code execution, you will mostly have to look elsewhere. One of the best sources for profiling data is hardware performance counters, available in most mo...

5922 sym R (1086 sym/9 pcs) 2 img