Publications by Roel M. Hogervorst

Cleaning up and combining data, a dataset for practice

11.03.2018

tldr: I created an open dataset for the explicit practice of data munging. Feel free to use it in assignments, but do mention where you got it from (CC-by-4.0). Also unicorns are awesome. Find the dataset at: https://github.com/RMHogervorst/unicorns_on_unicycles Data munging / cleaning / engineering At work I was working with a two excel files th...

3786 sym

Reading in an epub (ebook) file with the pubcrawl package

18.07.2018

In this tutorial I show how to read in a epub file (f.i. from your ebook collection on you computer) into R with the pubcrawl package. In emoji speak: ???????????? . I will show the reading in part, (one line of code) and some other actions you might want to perform on textfiles before they are ready for text analysis. After you read in your epub...

7948 sym R (13720 sym/23 pcs) 12 img

Arthur blinked, Ford shrugs, but Zaphod leapt; text as graph

23.07.2018

Can we make the computer say something about characters in a book? In this piece I will search for the names of characters and the words around those names in books. What can we learn about a character from text analysis? Of course it’s also just another excuse for me to read the Hitchhikers series! I will break down the text into chunks of two...

6619 sym R (17120 sym/19 pcs) 24 img

Make more useless packages!

30.08.2018

You should make more useless packages. To be more specific: make packages that are useful to you, but might be useless to others. Because building silly stuff is fun and sets the bar low for you to play and learn. I’m a big fan of Simone Giertz (see all the gifs in this post). Simone is known as the ‘Queen of Shitty Robots’ and has a yout...

4222 sym 6 img

Use `purrr` to feed four cats

09.09.2018

Use purrr to feed four cats In this example we will show you how to go from a ‘for loop’ to purrr. Use this as a cheatsheet when you want to replace your for loops. Imagine having 4 cats. (like this one:) Four real cats who need food, care and love to live a happy life. They are starting to meow, so it’s time to feed them. Our real life al...

5748 sym R (5924 sym/8 pcs) 2 img

interactive ggplot with tooltip using plotly

12.09.2018

A quick Random R thing I use a lot, recently learned, and I want you to know it too. In this post I’ll show you how to make a quick interactive plot with ggplot and plotly, so that values are displayed when you hover your mouse over it. Why would you want this? If you are exploring the data, you want some quick insights into which values are wh...

1346 sym R (5111 sym/4 pcs) 2 img

Tweeting wikidata info

18.11.2018

In this explainer I walk you through the steps I took to create a twitter bot that tweets daily about people who died on that date. I created a script that queries wikidata, takes that information and creates a sentence. That sentence is then tweeted. For example: A tweet I literally just send out from the docker container I hope you are has ex...

5526 sym R (4163 sym/5 pcs) 4 img

Running an R script on heroku

05.12.2018

In this post I will show you how to run an R script on heroku every day. This is a continuation of my previous post on tweeting a death from wikidata. Why would I want to run a script on heroku? It is extremely simple, you don’t need to spin up a machine in the cloud on AWS, Google, Azure or Nerdalize. You can just run the script and it works....

3066 sym R (322 sym/2 pcs) 4 img

Graphing My Daily Phone Use

27.01.2019

How many times do I look at my phone? I set up a small program on my phone to count the screen activations and logged to a file. In this post I show what went wrong and how to plot the results. The data I set up a small program on my phone that counts every day how many times I use my phone (to be specific, it counts the times the screen has bee...

2599 sym R (1365 sym/5 pcs) 4 img

Quick post – detect and fix this ggplot2 antipattern

06.03.2019

Recently one of my coworkers showed me a ggplot and although it is not wrong, it is also not ideal. Here is the TL:DR : Whenever you find yourself adding multiple geom_* to show different groups, reshape your data In software engineering there are things called antipatterns, ways of programming that lead you into potential trouble. This is one ...

2423 sym R (5648 sym/6 pcs) 10 img