Publications by Marek Gągolewski
ICU Unicode text transforms in the R package stringi
The ICU (International Components for Unicode) library provides very powerful and flexible ways to apply various Unicode text transforms. These include: Full (language-specific) case mappings, Unicode normalization, Text transliteration (e.g. script-to-script conversion). All of these are available to R programmers/users via our still maturing ...
5750 sym R (2904 sym/14 pcs)
Faster, easier, and more reliable character string processing with stringi 0.3-1
A new release of the stringi package is available on CRAN (please wait a few days for Windows and OS X binary builds). # install.packages("stringi") or update.packages() library("stringi") stringi is an R package providing (but definitely not limiting to) equivalents of nearly all the character string processing functions known from base R. While...
4628 sym R (6553 sym/13 pcs)
stringi 0.4-1 released – fast, portable, consistent character string processing
A new release of the stringi package is available on CRAN (please wait a few days for Windows and OS X binary builds). # install.packages("stringi") or update.packages() library("stringi") Here’s a list of changes in version 0.4-1. In the current release, we particularly focused on making the package’s interface more consistent with that of t...
3204 sym R (1788 sym/9 pcs)
Using Hadoop Streaming API to perform a word count job in R and C++
by Marek Gagolewski, Maciej Bartoszuk, Anna Cena, and Jan Lasek (Rexamine). Introduction In a recent blog post we explained how we managed to set up a working Hadoop environment on a few CentOS7 machines. To test the installation, let’s play with a simple example. Hadoop Streaming API allows to run Map/Reduce jobs with any programs as the mapp...
2310 sym R (2000 sym/10 pcs)
Pull the (character) strings with stringi 0.5-2
A reliable string processing toolkit is a must-have for any data scientist. A new release of the stringi package is available on CRAN (please wait a few days for Windows and OS X binary builds). As for now, about 850 CRAN packages depend (either directly or recursively) on stringi. And quite recently, the package got listed among the top download...
4643 sym R (5710 sym/13 pcs)
Speeding up R packages’ installation process
There is a time for some things, and a time for all things; a time for great things, and a time for small things — Miguel de Cervantes Building R packages from sources may take a long time, especially if they contain a lot of C/C++/Fortran code. Long compile time might be especially frustrating if you are a package developer and you need to rec...
1479 sym R (248 sym/3 pcs) 4 img