Publications by That’s so Random

Why your S3 method isn’t working

15.06.2018

Throughout the last years I noticed the following happening with a number of people. One of those people was actually yours truely a few years back. Person is aware of S3 methods in R through regular use of print, plot and summary functions and decides to give it a go in own work. Creates a function that assigns a class to its output and then imp...

3663 sym R (963 sym/6 pcs)

Dealing with failed projects

22.11.2018

Recently, I came up with Thoen’s law. It is an empirical one, based on several years of doing data science projects in different organisations. Here it is: The probability that you have worked on a data science project that failed, approaches one very quickly as the number of projects done grows. I think many, far more than we as a community li...

6854 sym

Using Rstudio Jobs for training many models in parallel

26.02.2019

Recently, Rstudio added the Jobs feature, which allows you to run R scripts in the background. Computations are done in a separate R session that is not interactive, but just runs the script. In the meantime your regular R session stays live so you can do other work while waiting for the Job to complete. Instead refreshing your Twitter for the 15...

7070 sym R (834 sym/4 pcs)

Code and Data in a large Machine Learning project

17.03.2019

We did a large machine learning project at work recently. It involved two data scientists, two backend engineers and a data engineer, all working on-and-off on the R code during the project. The project had many interesting and new aspects to me, among them are doing data science in an agilish way, how to keep track of the different model version...

6561 sym R (417 sym/1 pcs)

Code and Data in a large Machine Learning project

18.03.2019

We did a large machine learning project at work recently. It involved two data scientists, two backend engineers and a data engineer, all working on-and-off on the R code during the project. The project had many interesting and new aspects to me, among them are doing data science in an agilish way, how to keep track of the different model version...

6561 sym R (427 sym/1 pcs)

Predictability of Tennis Grand Slams

26.05.2019

The European tennis season is in full swing, with Roland Garros starting today and Wimbledon taking place in a few weeks. For a sports buff like me, it is the essence of summer (together with the Tour de France). Time to dive into some tennis data. As a follower of both the men’s and the women’s tour it occurred to me that in the latter the t...

4986 sym R (1953 sym/2 pcs) 2 img 1 tbl

padr is updated

12.06.2019

Yesterday v.0.5.0 of the padr package hit CRAN. You will find the main changes in the thicken function, that has gained two new arguments. First of all, by an idea of Adam Stone, you are now enabled to drop the original datetime variable from the data frame by using drop = TRUE. This argument defaults to FALSE to ensure backwards compatibility. W...

2479 sym R (521 sym/2 pcs)

The Psychology of Flame Wars

26.06.2019

I have been meaning to write this for a while, but with the dplyr vs data.table feud rising to new levels on Twitter the last couple of days, it all of a sudden seems more relevant. For those who don’t know what I am talking about, there are different ways of doing data science. There are the two major languages R and python, with their own imp...

7924 sym