Publications by Roel M. Hogervorst

Gosset part 2: small sample statistics

10.08.2019

A NICE ONELINER HERE? This post is an explainer about the small sample experiment and determining the ideal sample size for inference. Economic perspectives and business logic Brewing beer at scale One of the problems William S. Gosset worked on was determining the quality of Malt. To brew beer you need 3 ingredients, yeast, hops and a cereal gr...

6875 sym R (6371 sym/6 pcs) 6 img

Gosset part 2: small sample statistics

10.10.2019

Simulation was the key to to achieve world beer dominance. ‘Scientific’ Brewing at scale in the early 1900s Beer bottles cheers This post is an explainer about the small sample experiments performed by William S. Gosset. This post contains some R code that simulates his simulations1 and the resulting determination of the ideal sample size ...

11216 sym R (6414 sym/7 pcs) 14 img

Scraping Gdpr Fines

07.04.2020

The website Privacy Affairs keeps a list of fines related to GDPR. I heard * that this might be an interesting dataset for TidyTuesdays. The dataset contains at this moment 250 fines given out for GDPR violations and is last updated (according to the website) on 31 March 2020. All data is from official government sources, such as official report...

2961 sym R (2266 sym/3 pcs) 4 img

Where does the output of Rscript go?

13.04.2020

We often run R interactively, through Rstudio or in the terminal. But you can also run Rscripts without manual intervention. Using Rscript. But where does the output go? Warning: This post is very linux/unix (macos) centred, I don’t know how this works in Windows. Also I’m using the standard shell in linux ‘bash’ I believe there are some ...

3966 sym R (596 sym/6 pcs)

Munging and reordering Polarsteps data

22.04.2020

This post is about how to extract data from a json, turn it into a tibble and do some work with the result. I’m working with a download of personal data from polarsteps. A picture of Tokomaru Wharf (New Zealand) I was a month in New Zealand, birthplace of R and home to Hobbits. I logged my travel using the Polarsteps application. The app allo...

4780 sym R (7460 sym/12 pcs) 4 img

New Package, Pinboardr

10.05.2020

I’ve created a new package to interact with pinboard not to be confused with pinterest. I noticed there wasn’t a package yet and the API is fairly clear. So come and check it out {pinboardr} at https://github.com/RMHogervorst/pinboardr I did see a new package to interact with pocket: pocketapi. Since pocket is also a kind of bookmark manager ...

2004 sym R (4521 sym/5 pcs) 4 img

Expressing size in bananas a dive into {vctrs}

07.06.2020

Recently I’ve become interested in relative sizes of things. Maybe I’m paying more attention to my surroundings since I’m locked at home for so long. Maybe my inner child is finally breaking free. Whatever the reason, I channeled all of that into two packages: everydaysizes A rather unfinished collection of dimensions of everyday objects. ...

2844 sym R (5475 sym/10 pcs)

How to Use Lightgbm with Tidymodels

26.08.2020

So you want to compete in a kaggle competition with R and you want to use tidymodels. In this howto I show how you can use lightgbm (LGBM) with tidymodels. I give very terse descriptions of what the steps do, because I believe you read this post for implementation, not background on how the elements work. Why tidymodels? It is a unified machine l...

7170 sym R (8776 sym/22 pcs) 8 img 2 tbl

How to Use Catboost with Tidymodels

27.08.2020

So you want to compete in a kaggle competition with R and you want to use tidymodels. In this howto I show how you can use CatBoost with tidymodels. I give very terse descriptions of what the steps do, because I believe you read this post for implementation, not background on how the elements work. This tutorial is extremely similar to my previou...

6972 sym R (9282 sym/23 pcs) 8 img 2 tbl

Running an R Script on a Schedule: Heroku

20.09.2020

In this tutorial I have an R script that runs every day on heroku. It creates a curve in ggplot2 and posts that picture to twitter. The use case is this: You have a script and it needs to run on a schedule (for instance every day). In 2018 I wrote a small post how to run an R script on heroku. The amazing thing is that the bot I created back then...

6843 sym Python (2085 sym/5 pcs) 10 img