Publications by Random R Ramblings

Piping Within Pipes

21.11.2016

The magrittr pipe (%>%) has revolutionised the way many people now write R code. I’ve been using R for over 7 years and the pipe has become a staple of my programming conventions. However it was recently brought to my attention that you can actually use pipes within function calls, which can make my code even more human readable. Take for examp...

1284 sym R (435 sym/4 pcs) 2 img

Automatically Building, Testing and Deploying bookdown with Travis and GitHub Pages

16.05.2017

I recently began writing some documentation around coding standards using bookdown. The workflow I was using was to write the new sections of the book, build the book and then push the changes to GitHub where it is hosted using GitHub Pages. This was clearly one manual step too far for me as I consistently forgot to build the book before I pushed...

2385 sym R (855 sym/3 pcs) 2 img

The Twitter Waterflow Problem

09.08.2017

Introduction I was recently introduced to the Twitter Waterflow Problem and I decided it was interesting enough to try and complete the challenge in R. Consider the following picture: This plot shows a series of walls and empty valleys. We can represent this picture by an array of integers, where the value at each index is the height of the wall...

5384 sym R (4243 sym/10 pcs) 4 img

Accessing Private Methods from an R6 Class

13.08.2017

I recently wrote a package to solve the Twitter Waterflow Problem using an R6 class. You can view the package here and read about how I approached the problem here. In this blog post, I want to highlight how you can access private members of an R6 class which Winston Chang mentioned in his useR!2017 talk. I will use the waterflow package for this...

1299 sym R (620 sym/2 pcs)

Project Euler in R

28.02.2018

Project Euler This is just a short blog post to raise some awareness to some fun programming and mathematical challenges I recently came across, hosted on Project Euler. The idea behind Project Euler is to provide abstract programming challenges for people to develop their skills and learn new concepts in a recreational way. The problems range in...

1891 sym

Extending sparklyr: Data Types

08.03.2018

TL;DR sparklyr maps R data types and data storage types to Scala, but it doesn’t handle all data storage types. This blog post discusses how to generate Scala data storage types from the R side, that are not generated by sparklyr. You can do this by using the sparklyr::invoke_new function to generate the objects you want in Java or Scala, for e...

4814 sym R (1043 sym/5 pcs) 1 tbl

No visible binding for global variable

18.08.2019

Recently I have been working on a very large legacy project which utilises the excellent data.table package throughout. What this has resulted in is an R CMD check containing literally thousands of NOTEs similar to the following: ❯ checking R code for possible problems ... NOTE my_fn: no visible binding for global variable ‘mpg’ There are...

6005 sym R (1840 sym/16 pcs)

Including Optional Functionality from Other Packages in Your Code

05.09.2019

Introduction Let’s say you want to write a function with optional functionality which is dependent on the installation of a package that your colleague may not have installed. For example, let’s say you want to have an option to return a data.table (or a tibble) instead of a data.frame, but in this case you don’t want to force your function...

4915 sym R (2331 sym/8 pcs)

Selecting the max value from each group, a case study: base R

14.09.2019

Introduction Let’s say we wish to group some data by a variable, then for each group we wish to find the row of the maximum value of another variable, and then finally extract the entire row. This is a fairly common task and in fact I’ve had to do this exact data exploration technique on several occasions in the last week using different synt...

3081 sym R (1017 sym/3 pcs)

Selecting the max value from each group, a case study: data.table

14.09.2019

Introduction In my last post we looked at how to slice a data.frame by group to obtain the rows for which a particular column in that group is at its maximum value sing base R. In this post, we will be taking a look at how to perform this task using data.table. data.table Solution(s) For this exercise we will be using datasets::mtcars and so fir...

3977 sym R (1346 sym/6 pcs)