Publications by richierocks

(Almost) Friday Function: alarm

21.04.2011

Last week I decided to start a weekly column detailing an interesting function each Friday, entirely forgetting that I would be on holiday, without internet access (shock horror!), tomorrow. So here’s your column a little early. The alarm function is something of a novelty, in that all it does is to make an annoying noise when you call it. Th...

1205 sym R (60 sym/1 pcs) 16 img

Friday function triple bill: with vs. within vs. transform

29.04.2011

When you first learnt about data frames in R, I’m sure that, like me, you thought “This is a lot of hassle having to type the names of data frames over and over in order to access each column”. library(MASS) anorexia$wtDiff <- anorexia$Postwt - anorexia$Prewt #I have to type anorexia how many times? Indeed, any time you see chunks of code ...

2406 sym R (834 sym/4 pcs) 16 img

Friday Function: nclass

06.05.2011

When you draw a histogram, an important question is “how many bar should I draw?”. This should inspire an indignant response. You didn’t become a programmer to answer questions, did you? No. The whole point of programming is to let your computer do your thinking for you, giving you more time to watch videos of fluffy kittens. Fortunately, R...

1983 sym R (1008 sym/4 pcs) 16 img

A clock utility, via console hackery

11.05.2011

A discussion on StackOverflow today shows an interesting use of special characters inside the cat function. The most common special characters that you may have come across are the tab and newline characters, represented by \t and \n respectively. Try them for yourself. cat("Red\tlorry\nYellow\tlorry\n") cat also respects the backspace charac...

1158 sym R (286 sym/3 pcs) 16 img

Tracking execution paths

18.06.2011

Earlier this week, I was trying to figure out the path of execution through a big chunk of code. Once you reach a certain size of codebase, tracking which function gets called when can be tricky. My first thought for dealing with this was to add a message line at the start of each function that I wanted to track. (Note: message, not cat!) f <...

2115 sym R (1249 sym/6 pcs) 16 img

Testing for valid variable names

03.07.2011

I have something a fondness for ridiculous variable names, so it’s useful to be able to check whether my latest concoction is legitimate. More so if it is automatically generated. Not having an is_valid_variable_name function is one of those odd omissions from R, and the assign function doesn’t check validity. To recap, there are a few rules ...

4525 sym R (1977 sym/10 pcs) 16 img

The method in the mirror: reflection in R

17.07.2011

Reflection is a programming concept that sounds scarier than it is. There are three related concepts that fall under the umbrella of reflection, and I’ll be surprised if you haven’t come across most of these code ideas already, even if you didn’t know it was called reflection. The first concept is examination of your variables. In R, this ...

1841 sym R (141 sym/3 pcs) 16 img

The Stats Clinic

27.07.2011

Here at HSL we have a lot of smart kinda-numerate people who have access to a lot of data. On a bad day, kinda-numerate includes myself, but in general I’m talking about scientists who have have done an introductory stats course, but not much else. When all you have is a t-test, suddenly everything looks like two groups of normally distribute...

1519 sym 18 img

Monster functions (Raaargh!)

12.08.2011

It’s widely considered good programming practice to have lots of little functions rather than a few big functions. The reasons behind this are simple. When your program breaks, it’s much nicer to debug a five line function than a five hundred line function. Additionally, by breaking up your code into little chunks, you often find that some...

3116 sym R (715 sym/5 pcs) 18 img

Stop! (In the name of a sensible interface)

12.08.2011

In my last post I talked about using the number of lines in a function as a guide to whether you need to break it down into smaller pieces. There are many other useful metrics for the complexity of a function, most notably cyclomatic complexity, which tracks the number of different routes that code can take. It’s non-trivial to calculate such...

3392 sym R (1453 sym/7 pcs) 20 img