Publications by Jan Górecki - R

Data anonymization in R

06.11.2014

Use cases Public reports. Public data sharing, e.g. R packages download logs from CRAN's RStudio mirror – cran-logs.rstudio.com – mask ip addresses. Reports or data sharing for external vendor. Development works can operate on anonymized PRODUCTION data. Manually or semi-manually populated data can often brings some new issue after migration...

2262 sym R (2035 sym/16 pcs) 12 tbl

R in Business Intelligence

18.01.2015

Business Intelligence (BI) can be simply described as extracting useful informations from the data. This is quite a broad process as the source data structure (and quality) can vary, as well the useful information structure can vary. More technically process of such transformation can be described as ETL (extract, transform, load), plus presentat...

7303 sym

shinyData – GUI for data analysis and reporting

18.03.2015

Some people find very hard to start using R because it has no GUI. There exists some GUIs which offers some of the functionality of R. In this post I would like to focus on one such GUI, a very new shiny application called shinyData. I hope the app will make it easier for some to get into R environment. Also it can reduce development time of anal...

3149 sym R (388 sym/1 pcs) 10 img

Auditing data transformation

02.06.2015

Auditing data transformation can be simply described as gathering metadata about the transformation process. The most basics metadata would be a timestamp, atomic transformation description, data volume on input, data volume on output, time elapsed. If you work with R only interactively you may find it more like a fancy tool. On the other hand ...

4213 sym R (1426 sym/6 pcs) 3 tbl

Data Warehousing with R

29.06.2015

Under this link you can find today's slides from the Cardiff R User Group meeting. On the slides you may find interesting packages from the Data Warehousing / ETL perspective. Including few examples and a lot of links to packages repositories. Slides are fully reproducible, including connection via DBI, RJDBC and RODBC to Postgres and SQLserver. ...

771 sym

Accept payments in shiny app

03.08.2015

Have you ever think about accepting payments in your shiny app? Probably not, but now you can start 😉 Shiny apps are usually single task, not very heavy websites. It may be not so easy to turn them into online shop/service provider. Anyway you can find this post interesting as it presents a paperwork-less implementation to accept payments. S...

6210 sym R (976 sym/3 pcs) 2 img

Accept payments in shiny app

03.08.2015

Have you ever think about accepting payments in your shiny app? Probably not, but now you can start 😉 Shiny apps are usually single task, not very heavy websites. It may be not so easy to turn them into online shop/service provider. Anyway you can find this post interesting as it presents a paperwork-less implementation to accept payments. S...

6210 sym R (976 sym/3 pcs) 2 img

Utilize function body inline comments for documentation

17.09.2015

When writing a long function which has to deal with multiple checks and complex processes, it is valuable to put comments in the function body. This allows readers (including you) to catch the concept of process workflow without going into details. I'm going to present a way how those comments can be nicely reused for the documentation purpose. ...

2125 sym R (3407 sym/6 pcs) 2 img

Utilize function body inline comments for documentation

17.09.2015

When writing a long function which has to deal with multiple checks and complex processes, it is valuable to put comments in the function body. This allows readers (including you) to catch the concept of process workflow without going into details. I'm going to present a way how those comments can be nicely reused for the documentation purpose. ...

2125 sym R (3402 sym/6 pcs) 2 img

Scaling data.table using index

22.11.2015

R can handle fairly big data working on a single machine, 2B (2E9) rows and couple of columns require about 100 GB of memory. This is already well enough to care about performance. With this post I'm going discuss scalability of filter queries. The index has been introduced to data.table in 1.9.4. It is also known as secondary keys. Unlike wit...

2954 sym R (4790 sym/7 pcs) 2 img