Publications by schochastics
Efficient row min calculation: From R to C
My colleague Chung-hong Chan started a new package in our teams GitHub organization. An issue there caught my attention. The performance was very slow of the main function. The issue lay somewhere in the auxiliary functions. This lead me down quite a rabbit hole to optimize the calculation of row minimums (you can skip the prelude, if you are not i...
3488 sym Python (4796 sym/10 pcs)
adaR: An accurate, fast and WHATWG-compliant URL parser
The other week, I found an interesting looking library on GitHub. ada-url, a WHATWG-compliant and fast URL parser written in modern C++. Since we need such a thing at work to analyze webtracking data, and I recently successfully wrapped my first C++ library into an R package, I thought I could do the same with ada-url. Little did I know, that wrapp...
4566 sym R (8161 sym/11 pcs) 2 img
Preprocessing and analyzing web tracking data with webtrackR
Researchers have relied on free/easy access to APIs from social media platforms for a very long time. But in the recent past, many prominent platforms revoked the free access to their API and made accessing the data almost unaffordable for regular researchers. The need for alternative data sources to study the online behaviour of individuals is big...
4388 sym R (851 sym/3 pcs) 2 img
Create a CV with Quarto
In this post, I will introduce a few extensions for Quarto to create nicely looking CVs. Disclaimer: the underlying templates where not created by me but where adopted from these LaTeX templates for CVs and resumes and this modern Latex CV. Creating the templates was quite straightforward (I wish I had time to write a detailed post about that…). ...
1369 sym R (171 sym/4 pcs) 8 img
A suite of tools to scrape and parse search engine results
My posts are usually R only. But in this post, I want to talk about a suite of tools developed by my colleagues and me that goes beyond R only. This suite of tools helps to gather results from different search engines and includes a browser extension to scrape the results, and a Python library and an R package to parse the results. The browser ext...
5043 sym R (2656 sym/12 pcs) 2 img
rang: make ancient R code run again
Reproducibility is a big issue in the (computational) world of science. Code that runs today might not run tomorrow because packages are updated, functions deprecated or removed, and whole programming languages change. In the case of R, there exist a great variety of packages to ensure that code written today, also runs tomorrow (and hopefully also...
3777 sym R (1493 sym/8 pcs) 2 img
Extending network analysis in R with netUtils
During the last 5 years, I have accumulated various scripts with (personal) convenience functions for network analysis and I also implemented new methods from time to time which I could not find in any other package in R. The package netUtils gathers all these functions and makes them available for anyone who may also needs to apply “non-standa...
3454 sym R (4828 sym/8 pcs) 8 img
rtoot: Collecting and Analyzing Mastodon Data
It has been a wild view days on Twitter after Elon Musk took over. The future of the platform is unclear and many users are looking for alternatives, a popular one being mastodon. I also decided to give it a try and signed up. I quite quickly became interested in its API and realized that there is only a seemingly unmaintained R package on github...
3911 sym R (17834 sym/11 pcs) 2 img
Academicons: my first quarto extension
I have been following the development of quarto for a while now and I am pretty excited about it. Not only its features but also its rich and detailed documentation will make me transition from Rmarkdown to Quarto in the long run. While moving my personal webpage, I realized though that I am still missing some features. Quarto is still in its ear...
2796 sym R (1022 sym/4 pcs) 2 img
Dimensionality Reduction Methods Using FIFA 18 Player Data
In this post, I will introduce three different methods for dimensionality reduction of large datasets. #used packages library(tidyverse) # for data wrangling library(stringr) # for string manipulations library(ggbiplot) # pca biplot with ggplot library(Rtsne) # implements the t-SNE algorithm library(kohonen) # implements self organi...
9659 sym R (8940 sym/14 pcs) 14 img