Publications by Jozef's Rblog
Using parallelization, multiple git repositories and setting permissions when automating R applications with Jenkins
Introduction In the previous post, we focused on setting up declarative Jenkins pipelines with emphasis on parametrizing builds and using environment variables across pipeline stages. In this post, we look at various tips that can be useful when automating R application testing and continuous integration, with regards to orchestrating paralleliz...
4932 sym R (1192 sym/6 pcs) 2 img
Using Spark from R for performance with arbitrary code – Part 1 – Spark SQL translation, custom functions, and Arrow
Introduction Apache Spark is a popular open-source analytics engine for big data processing and thanks to the sparklyr and SparkR packages, the power of Spark is also available to R users. This series of articles will attempt to provide practical insights into using the sparklyr interface to gain the benefits of Apache Spark while still retaining...
11879 sym R (3125 sym/13 pcs) 2 img
Using Spark from R for performance with arbitrary code – Part 2 – Constructing functions by piping dplyr verbs
Introduction In the first part of this series, we looked at how the sparklyr interface communicates with the Spark instance and what this means for performance with regards to arbitrarily defined R functions. We also examined how Apache Arrow can increase the performance of data transfers between the R session and the Spark instance. In this sec...
7220 sym R (12363 sym/15 pcs) 2 img
Using Spark from R for performance with arbitrary code – Part 3 – Using R to construct SQL queries and let Spark execute them
Introduction In the previous part of this series, we looked at writing R functions that can be executed directly by Spark without serialization overhead with a focus on writing functions as combinations of dplyr verbs and investigated how the SQL is generated and Spark plans created. In this third part, we will look at how to write R functions t...
8406 sym R (10902 sym/16 pcs) 2 img
Using Spark from R for performance with arbitrary code – Part 4 – Using the lower-level invoke API to manipulate Spark’s Java objects from R
Introduction In the previous parts of this series, we have shown how to write functions as both combinations of dplyr verbs and SQL query generators that can be executed by Spark, how to execute them with DBI and how to achieve lazy SQL statements that only get executed when needed. In this fourth part, we will look at how to write R functions t...
8503 sym R (6312 sym/15 pcs) 2 img
Using Spark from R for performance with arbitrary code – Part 5 – Exploring the invoke API from R with Java reflection and examining invokes with logs
Introduction In the previous parts of this series, we have shown how to write functions as both combinations of dplyr verbs, SQL query generators that can be executed by Spark and how to use the lower-level API to invoke methods on Java object references from R. In this fifth part, we will look into more details around sparklyr’s invoke() API,...
7846 sym R (10760 sym/15 pcs) 2 img
4 great free tools that can make your R work more efficient, reproducible and robust
Introduction It is Christmas time again! And just like last year, what better time than this to write about the great tools that are available to all interested in working with R. This post is meant as a praise to a few selected tools and packages that helped me to be more efficient and productive with R in 2019. In this post, we will praise fre...
8674 sym R (1130 sym/9 pcs) 10 img
Releasing and open-sourcing the Using Spark from R for performance with arbitrary code series
Introduction Over the past months, we published and refined a series of posts on Using Spark from R for performance with arbitrary code. Since the posts have grown in size and scope the blogposts were no longer the best medium to share the content in the way most useful to the readers, we decided to compile a publication instead and open-source i...
4089 sym 2 img
R is turning 20 years old next Saturday. Here is how much bigger, stronger and faster it got over the years
Introduction It is almost the 29th of February 2020! A day that is very interesting for R, because it marks 20 years from the release of R v1.0.0, the first official public release of the R programming language. In this post, we will look back on the 20 years of R with a bit of history and 3 interesting perspectives – how much faster did R get...
7557 sym R (322 sym/1 pcs) 2 img
Setting up R with Visual Studio Code quickly and easily with the languageserversetup package
Introduction Over the past years, R has been gaining popularity, bringing to life new tools to with ith it. Thanks to the amazing work by contributors implementing the Language Server Protocol for R and writing Visual Studio Code Extensions for R, the most popular development environment amongst developers across the world now has very strong sup...
7056 sym R (137 sym/3 pcs) 4 img