Publications by Jozef's Rblog

Automating R package checks across platforms with GitHub Actions and Docker in a portable way

18.04.2020

Introduction Automating the execution, testing and deployment of R code is a powerful tool to ensure the reproducibility, quality and overall robustness of the code that we are building. A relatively recent feature in GitHub – GitHub actions – allows us to do just that without using additional tools such as Travis or Jenkins for our repositor...

9415 sym R (474 sym/3 pcs) 2 img

A review of my experience with the Functional Programming Principles in Scala course

13.06.2020

Introduction Functional programming is a programming paradigm where programs are constructed by applying and composing functions and it quite popular in the data science application because of some of its useful properties that can help for example with scaling computations. One well-known resource to get into functional programming is the Functi...

9793 sym 2 img

Exploring and plotting positional ice hockey data on goals, penalties and more from R with the {nhlapi} package

04.07.2020

Introduction The National Hockey League (NHL) is considered to be the premier professional ice hockey league in the world, founded 102 years ago in 1917. Like many other sports, the data about teams, players, games, and more are a great resource to dive in and analyze using modern software tools. Thanks to the open NHL API, the data is accessible...

4807 sym R (8771 sym/9 pcs) 6 img

A review of my experience with the Big Data Analysis with Scala and Spark course

25.07.2020

Introduction Apache Spark is an open-source distributed cluster-computing framework implemented in Scala that first came out in 2014 and has since then become popular for many computing applications including machine learning thanks to among other aspects its user-friendly APIs. The popularity also gave rise to many online courses of varied quali...

8803 sym R (2578 sym/2 pcs) 2 img

A guide to retrieval and processing of data from relational database systems using Apache Spark and JDBC with R and sparklyr

15.08.2020

Introduction The {sparklyr} package lets us connect and use Apache Spark for high-performance, highly parallelized, and distributed computations. We can also use Spark’s capabilities to improve and streamline our data processing pipelines, as Spark supports reading and writing from many popular sources such as Parquet, Orc, etc. and most databa...

9931 sym R (6075 sym/12 pcs) 2 img

Optimizing partitioning for Apache Spark database loads via JDBC for performance

26.12.2020

Introduction Apache Spark is a popular open-source analytics engine for big data processing and thanks to the sparklyr and SparkR packages, the power of Spark is also available to R users. A very common task in working with Spark apart from using HDFS-based data storage is also interfacing with traditional RDMBS systems such as Oracle, MS SQL Ser...

9227 sym R (4264 sym/9 pcs) 6 img