Publications by Marek Gągolewski

ICU Unicode text transforms in the R package stringi

20.05.2014

The ICU (International Components for Unicode) library provides very powerful and flexible ways to apply various Unicode text transforms. These include: Full (language-specific) case mappings, Unicode normalization, Text transliteration (e.g. script-to-script conversion). All of these are available to R programmers/users via our still maturing ...

5750 sym R (2904 sym/14 pcs)

Faster, easier, and more reliable character string processing with stringi 0.3-1

06.11.2014

A new release of the stringi package is available on CRAN (please wait a few days for Windows and OS X binary builds). # install.packages("stringi") or update.packages() library("stringi") stringi is an R package providing (but definitely not limiting to) equivalents of nearly all the character string processing functions known from base R. While...

4628 sym R (6553 sym/13 pcs)

stringi 0.4-1 released – fast, portable, consistent character string processing

14.12.2014

A new release of the stringi package is available on CRAN (please wait a few days for Windows and OS X binary builds). # install.packages("stringi") or update.packages() library("stringi") Here’s a list of changes in version 0.4-1. In the current release, we particularly focused on making the package’s interface more consistent with that of t...

3204 sym R (1788 sym/9 pcs)

Using Hadoop Streaming API to perform a word count job in R and C++

25.02.2015

by Marek Gagolewski, Maciej Bartoszuk, Anna Cena, and Jan Lasek (Rexamine). Introduction In a recent blog post we explained how we managed to set up a working Hadoop environment on a few CentOS7 machines. To test the installation, let’s play with a simple example. Hadoop Streaming API allows to run Map/Reduce jobs with any programs as the mapp...

2310 sym R (2000 sym/10 pcs)

Pull the (character) strings with stringi 0.5-2

23.06.2015

A reliable string processing toolkit is a must-have for any data scientist. A new release of the stringi package is available on CRAN (please wait a few days for Windows and OS X binary builds). As for now, about 850 CRAN packages depend (either directly or recursively) on stringi. And quite recently, the package got listed among the top download...

4643 sym R (5710 sym/13 pcs)

Speeding up R packages’ installation process

05.07.2015

There is a time for some things, and a time for all things; a time for great things, and a time for small things — Miguel de Cervantes Building R packages from sources may take a long time, especially if they contain a lot of C/C++/Fortran code. Long compile time might be especially frustrating if you are a package developer and you need to rec...

1479 sym R (248 sym/3 pcs) 4 img