Publications by That’s so Random

Quickly Check your id Variables

20.07.2017

Virtually every dataset has them; id variables that link a record to a subject and/or time point. Often one column, or a combination of columns, forms the unique id of a record. For instance, the combination of patient_id and visit_id, or ip_adress and visit_time. The first step in most of my analyses is almost always checking the uniqueness of a...

1443 sym R (617 sym/5 pcs)

Span Dates and Times without Overhead

26.07.2017

I am working on v.0.4.0 of the padr package this summer. Two new features that will be added are wrappers around seq.Date and seq.POSIXt. Since it is going to take a while before the new release is on CRAN, I go ahead and do an early presentation of these functions. Date and datetime parsing in base R are powerful and comprehensive, but also tedi...

3332 sym R (1602 sym/18 pcs)

Tidy evaluation, most common actions

25.08.2017

Tidy evaluation is a bit challenging to get your head around. Even after reading programming with dplyr several times, I still struggle when creating functions from time to time. I made a small summary of the most common actions I perform, so I don’t have to dig in the vignettes and on stackoverflow over and over. Each is accompanied with a min...

1555 sym R (1838 sym/16 pcs)

Non-standard evaluation, how tidy eval builds on base R

10.09.2017

As with many aspects of the tidyverse, its non-standard evaluation (NSE) implementation is not something entirely new, but built on top of base R. What makes this one so challenging to get your mind around, is that the Honorable Doctor Sir Lord General and friends brought concepts to the realm of the mortals that many of us had no, or only a vagu...

14720 sym R (3075 sym/73 pcs)

A ggplot-based Marimekko/Mosaic plot

01.11.2017

One of my first baby steps into the open source world, was when I answered this SO question over four years ago. Recently I revisited the post and saw that Z.Lin did a very nice and more modern implementation, using dplyr and facetting in ggplot2. I decided to merge here ideas with mine to create a general function that makes MM plots. I also add...

2709 sym R (3770 sym/7 pcs) 10 img

padr version 0.4.0 now on CRAN

17.11.2017

I am happy to share that the latest version of padr just hit CRAN. This new version comprises bug fixes, performance improvements and new functions for formatting datetime variables. But above all, it introduces the custom paradigm that enables you to do asymmetric analysis. Improvements to existing functions thicken used to get slowish when the ...

4476 sym R (1529 sym/5 pcs) 8 img

A two-stage workflow for data science projects

27.11.2017

If you are a data scientist who primarily works with R, chances are you had no formal training in software development. I certainly did not pick up many skills in that direction during my statistics masters. For years my workflow was basically load a dataset and hack away on it. In the best case my R-script came to some kind of conclusion or fina...

6108 sym

Color palettes derived from the Dutch masters

13.12.2017

Among tulip fields, canals and sampling cheese, the museums of the Netherlands are one of its biggest tourist attractions. And for very good reasons! During the seventeenth century, known as the Dutch Golden Age, there was an abundance of talented painters. If you ever have the chance to visit the Rijksmuseum you will be in awe by the landscap...

3083 sym R (311 sym/3 pcs) 6 img

Make your own color palettes with paletti

22.12.2017

Last week I blogged about the dutchmasters color palettes package, which was inspired by the wonderful ochRe package. As mentioned I shamelessly copied the package. I replaced the list with character vectors containing hex colors and did a find and replace to make it dutchmasters instead of ochRe. This was pretty ugly. I realized that when ...

2592 sym R (997 sym/8 pcs) 14 img

A recipe for recipes

29.05.2018

If you build statistical or machine learning models, the recipes package can be useful for data preparation. A recipe object is a container that holds all the steps that should be performed to go from the raw data set to the set that is fed into model a algorithm. Once your recipe is ready it can be executed on a data set at once, to perform all ...

10789 sym R (6117 sym/18 pcs)