Publications by HighlandR

new programming with data.table

04.02.2024

The newest version of data.table has hit CRAN, and there are lots of great new features. Among them, a %notin% function, a new let function that can be used instead of := ( I wasn’t too fussed about this originally but have tried it a few times today and I may well adopt it – although I do like that := really stands out in my code when assignin...

3266 sym R (3345 sym/10 pcs)

more .I in data.table

02.02.2024

Following on from my last post, here is a bit more about the use of .I in data.table. Scenario : you want to obtain either the first, or last row, from a set of rows that belong to a particular group. For example, for a patient admitted to hospital, you may want to capture their first admission, or the entire time they were in a specific hospital (...

1990 sym R (714 sym/1 pcs)

.I in data.table

02.01.2024

In this post I’m using a small extract from the SIMD2020 dataset to figure out what the special operator .I does. Files and code are on github if you’re interested # files and code : https://github.com/johnmackintosh/DT_dot_I library(data.table) DT <- fread("highdata.csv") lookup <- fread("https://raw.githubusercontent.com/johnmackintosh/ph_loo...

3237 sym R (4872 sym/18 pcs)

non-equi joins in data.table

21.12.2023

I have been toying with some of the advent of code challenges (I am way behind though!). For day 5, I had to create a function, and I’m writing this up, because it’s an example of a non-equi join between two tables. In this particular sitation, there are are no common columns between the two tables, so my usual data.table hack of copying the c...

2121 sym R (808 sym/4 pcs)

Achieve your target

20.05.2023

Last week I had to talk my colleagues through the architecture of an R project that we’ve been working on for a while. This is a large project, as we make our first moves into Reproducible Analystic Pipelines, and makes heavy use of the {targets} package. As I was going through it, I realised that it was way too complex, and it wasn’t reasonabl...

3426 sym R (131 sym/1 pcs)

Pivoting in tidyr and data.table

19.02.2023

We all need to pivot data at some point, so these are just some notes for my own benefit really, because gather and spread are no longer in favour within tidyr. NB – this post has been updated with collapsible sections to show/hide the data and outputs. I tended to only ever need gather, and nearly always relied on the same key and value names, s...

1273 sym R (2297 sym/3 pcs)

Pivoting in tidyr and data.table

17.02.2023

We all need to pivot data at some point, so these are just some notes for my own benefit really, because gather and spread are no longer in favour within tidyr. I tended to only ever need gather, and nearly always relied on the same key and value names, so it was an easy function for me to use. I have discovered that pivot_longer and pivot_wider ar...

4673 sym R (7668 sym/24 pcs)

Making headlines

25.01.2023

In my current mammoth work project, I’m generating many plots. The titles are very descriptive (they tell you what the plot is about), but they are not really telling a story. That’s simply because there are so many on the production line. What we’d like, is to analyse the data, and extract the salient points. Better still, we’d want this...

4483 sym R (3143 sym/6 pcs) 2 img

On target

13.12.2022

Here are some notes on getting started with {targets}. The project I am working on involves several different reports, each at least 30 pages, and each with about 20 plots and 20 tables per document. As well as a myriad of functions, I had 7 very large R scripts doing the data munging and processing. I thought they were well ordered, but I had to...

3815 sym R (410 sym/2 pcs) 4 img

Dynamic schema, table and column names in SQL Server queries as a gateway to functional programming with R

05.04.2022

I’m looking into creating some functions to make it easier to carry out quality checks on our database tables. I’m using SQL Server, where tables are referred to in a [database].[schema].[tablename] format, although you can forgo some of this by using the USE statement – USE my_data_base GO Let’s say you want to run a simple query: SELEC...

2151 sym R (808 sym/7 pcs)