Publications by Michael Kuhn

Learning ggplot2: 2D plot with histograms for each dimension

03.09.2009

I have two 2D distributions and want to show on a 2D plot how they are related, but I also want to show the histograms (actually, density plots in this case) for each dimension. Thanks to ggplot2 and a Learning R post, I have sort of managed to do what I want to have:There are still two problems: The overlapping labels for the bottom-...

1036 sym R (1042 sym/1 pcs) 2 img

Comparing two-dimensional data sets in R

09.03.2011

I wanted to fit a continuous function to a discrete 2D distribution in R. I managed to do this by using nls, and wanted to display the data. I discovered a nice way to compare the actual data and the fit using ggplot2, where the background is the real data and the circles are the fitted data (the legend is not optimal, but for a slide...

887 sym 4 img 2 tbl

Comparing two-dimensional data sets in R; take II

10.03.2011

David commented on yesterday’s post and suggested to put the continuous fitted distribution in the background and the discrete, empirical distribution in the foreground. This looks quite nice, although there’s a slight optical illusion that makes the circles look as if they’d be filled with a gradient, even though they’re uniformly colore...

731 sym 4 img 2 tbl

ggplot2: Determining the order in which lines are drawn

11.08.2011

In a time series, I want to plot the values of an interesting cluster versus the background. However, if I’m not careful, ggplot will draw the items in an order determined by their name, so background items will obscure the interesting cluster: Correct: Interesting lines in front of backgroundWrong: Background lines obscure interes...

853 sym 4 img 1 tbl

2D plot with histograms for each dimension (2013 edition)

22.04.2013

In 2009, I wrote about a way to show density plots along both dimensions of a plot. When I ran the code again to adapt it to a new project, it didn’t work because ggplot2 has become better in the meantime. Below is the updated code. Using the gridExtra package and this hint from the ggplot2 wiki, we get this output:Source code: Re...

730 sym 2 img

Introducing parallelRandomForest: faster, leaner, parallelized

23.09.2013

Together with other members of Andreas Beyer’s research group, I participated in the DREAM 8 toxicogenetics challenge. While the jury is still out on the results, I want to introduce my improvement of the R randomForest package, namely parallelRandomForest.To cut to the chase, here is a benchmark made with genotype data from the D...

4012 sym 1 img

Creating composite figures with ggplot2 for reproducible research

10.03.2015

So far, I have been preparing composite figures by plotting the data using ggplot2, and then putting the panels together in OmniGraffle or Adobe Illustrator. Of course, every time the data is updated, I would need to go back to the vector editing program. After moving my manuscript from Word to knitr, I figured I should also try to cu...

1396 sym 4 img

Avoiding unnecessary memory allocations in R

08.03.2016

As a rule, everything I discover in R has already been discussed by Hadley Wickham. In this case, he writes:The reason why the C++ function is faster is subtle, and relates to memory management. The R version needs to create an intermediate vector the same length as y (x – ys), and allocating memory is an expensive operation. The C+...

1130 sym

Avoiding unnecessary memory allocations in R

08.03.2016

As a rule, everything I discover in R has already been discussed by Hadley Wickham. In this case, he writes:The reason why the C++ function is faster is subtle, and relates to memory management. The R version needs to create an intermediate vector the same length as y (x – ys), and allocating memory is an expensive operation. The C+...

1130 sym

New R package: a dictionary with arbitrary keys and values

11.03.2016

Coming from Python, the absence of a real dictionary in R has annoyed me for quite some time. Now, I actually needed to use vectors as keys in R:library(dict) d <- dict() d[[1]] <- 42 d[[c(2, 3)]] <- "Hello!" d[["foo"]] <- "bar" d[[1]] d[[c(2, 3)]] d$get("not here", "default") d$keys() d$values() d$items() # [[ ]] gives an error for unknown k...

854 sym R (216 sym/1 pcs)