Publications by Paul Hiemstra

RStudio Server part 3: using an ssh tunnel for high performance

08.02.2012

In part 2 of this series of posts on RStudio Server, I commented that I suspected that RStudio Server would be fast. The first time I tried this from a remote connection, I was disappointed with the performance. Many companies filter their http traffic, for example to be able to block Youtube. This takes time ofcourse, and reduces performance. If...

1709 sym R (49 sym/1 pcs)

R and presentations: a basic example of knitr and beamer

12.02.2012

Manually combining R code and a presentation can be quite a pain. Luckily, using tools like odfWeave, Sweave and knitr, integrating documents and R code is quite painless. In this post I want to take a look at combining the knitr package with the Latex package beamer. I use the knitr package instead of the the Sweave package because it basically ...

1381 sym R (682 sym/2 pcs)

Cleaning sentences by recursively merging words using R

13.08.2012

A question on StackOverflow really sparked my attention. The aim was to clean up a dataset of inappropriately spaced words. For example: > word5 <- "hotter the doghou se would be bec ause the co lor was diffe rent" My approach was to create what I call a wordpair object. The word pair object for the example sentence looks like: > abc1_pairs w...

1057 sym R (2887 sym/4 pcs)

Custom axis transformations in ggplot2

14.08.2012

To apply a data transformation on an axis in a ggplot, you can use coordinate transformations. For more detail see the ggplot2 documentation. A number of coordinate transformations is available, including log10 and sqrt. However, if you want to perform a custom transformation this is not trivial. Say the transformation involves x = 1/x. To get th...

982 sym R (215 sym/2 pcs)

Predicting the memory usage of an R object containing numbers

15.08.2012

To estimate if a certain vector of numbers will fit into memory, you can quite easily predict the memory usage based on the size of the vector. An integer vector will use 4 bytes per number, and a numeric vector 8 bytes (double precision float). The following function prints the estimated memory usage of a vector based on the size of the vector a...

1394 sym R (521 sym/2 pcs)

Data Mining with R course taught by Luis Torgo

08.01.2013

From the 25th of march onwards, Dr. Luis Torgo will teach a Data Mining with R course together with the DIKW Academy in Nieuwegein, The Netherlands. Dr. Torgo is an Associate Professor at the department of Computer Science at the university of Porto. He is also the author of the book Data Mining with R. His interest are in Machine Learning in gen...

1080 sym

Automatic spatial interpolation with R: the automap package

17.02.2013

In case of continuously collected data, e.g. observations from a monitoring network, spatial interpolation of this data cannot be done manually. Instead, the interpolation should be done automatically. To achieve this goal, I developed the automap package. automap builds on top of the excellent gstat package, and provides automatic spatial interp...

1976 sym R (309 sym/3 pcs) 2 img

Parsing complex text files using regular expressions and vectorization

24.03.2013

When text data is in a nice CSV format, read.csv is enough to parse it into a useable format. But if this is not the case, getting the data into a useable format is not so straightforward. In this post I particularly illustrate the use of regular expressions for complex and flexible text processing, and the power of vectorization in R. Vectorizat...

3179 sym R (3070 sym/12 pcs)

Bubble sort implemented in pure R

10.05.2013

Please note that this is programming I purely did for the learning experience. The pure R bubble sort implemented in this post is veeeeery slow for two reasons: Interpreted code with lots of iteration is very slow. Bubble sort is one of the slowest sorting algorithms (O(N^2)) The bubble sort sorting algorithm works by iterating over the unsorte...

2222 sym R (1085 sym/6 pcs) 2 img

Much more efficient bubble sort in R using the Rcpp and inline packages

14.05.2013

Recently I wrote a blogpost showing the implementation of a simple bubble sort algorithm in pure R code. The downside of that implementation was that is was awfully slow. And by slow, I mean really slow, as in “a 100 element vector takes 7 seconds to sort”-slow. One of the major opportunities for a speed is to start using a compiled language....

1668 sym R (2515 sym/2 pcs)