Publications by S. Richter-Walsh

ggplot2 style plotting in Python

08.05.2017

R is my language of choice for data science but a good data scientist should have some knowledge of all of the great tools available to them. Recently, I have been gleefully using Python for machine learning problems (specifically pandas and the wonderful scikit-learn). However, for all its greatness, I couldn’t help but feel it lacks a bit in ...

5093 sym R (1070 sym/12 pcs) 20 img

Bland-Altman/Tukey Mean-Difference Plots using ggplot2

31.05.2017

A very useful data visualisation tool in science, particularly in medical and sports settings, is the Bland-Altman/Tukey Mean-Difference plot. When comparing two sets of measurements for the same variable made by different instruments, it is often required to determine whether the instruments are in agreement or not. Correlation and linear regres...

2543 sym R (872 sym/5 pcs) 6 img

Ordinary Least Squares (OLS) Linear Regression in R

04.07.2017

Ordinary Least Squares (OLS) linear regression is a statistical technique used for the analysis and modelling of linear relationships between a response variable and one or more predictor variables. If the relationship between two variables appears to be linear, then a straight line can be fit to the data in order to model the relationship. The l...

4797 sym R (1142 sym/8 pcs) 10 img

Useful dplyr Functions (w/examples)

10.07.2017

The R package dplyr is an extremely useful resource for data cleaning, manipulation, visualisation and analysis. It contains a large number of very useful functions and is, without doubt, one of my top 3 R packages today (ggplot2 and reshape2 being the others). When I was learning how to use dplyr for the first time, I used DataCamp which offers ...

3752 sym R (5456 sym/13 pcs) 4 img

Cats are great and so is the forcats R package

28.03.2018

Cats are great. Perhaps Hadley Wickham and Lionel Henry think so too given the wonderful choice of name for their purrr package. Hadley Wickham has also created a superb package called forcats, likely  an abbreviation of “for categoricals” but wittingly cat-themed, which is very, very useful to the data scientist. In the data science profess...

4716 sym R (645 sym/6 pcs) 2 img