Publications by Randy Zwitch

A Million Text Files And A Single Laptop

28.01.2016

Wait…What? Why?More often that I would like, I receive datasets where the data has only been partially cleaned, such as the picture on the right: hundreds, thousands…even millions of tiny files. Usually when this happens, the data all have the same format (such as having being generated by sensors or other memory-constrained devices).The pro...

7127 sym R (199 sym/3 pcs) 2 img

RSiteCatalyst Version 1.4.7 (and 1.4.6.) Release Notes

01.02.2016

It seems as though I missed release notes for version RSiteCatalyst 1.4.6, so we’ll do those and RSiteCatalyst 1.4.7 (now on CRAN) and the same time…RSiteCatalyst 1.4.6This release was mostly tweaking some settings, specifically:Adding a second `top` argument within the Queue* functions for more control on results returned. It used to be the ...

3365 sym

Calling RSiteCatalyst From Python

22.02.2016

@randyzwitch Do you know if anyone has gotten RSiteCat running in a Jupyter Notebook that ran RPY2? Tired of using 2 different environments— Adam Gitzes (@FootballActuary) February 18, 2016This will be a very short post, because the only “new” information I’m going to provide is the minimal example to answer the question. Yes, it is in f...

1969 sym 2 img

Adobe Analytics Clickstream Data Feed: Loading To Relational Database

18.03.2016

In my previous post about the Adobe Analytics Clickstream Data Feed, I showed how it was possible to take a single day worth of data and build a dataframe in R. However, most likely your analysis will require using multiple days/weeks/months of data, and given the size and complexity of the feed, loading the files into a relational database makes...

4405 sym

RSiteCatalyst Version 1.4.8 Release Notes

04.04.2016

For being in RSiteCatalyst retirement, I’m ending up working on more functionality lately ¯_(ツ)_/¯. Here are the changes for RSiteCatalyst 1.4.8, which should be available on CRAN shortly:Segment StackingRSiteCatalyst now has the ability to take multiple values in the segment.id keyword for the Queue* functions. This functionality was gr...

2801 sym 4 img

Travis CI: “You Have Too Many Tests LOLZ!”

05.04.2016

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.As part of getting RSiteCatalyst 1.4.8 ready for CRAN, I’ve managed to accumulate hundreds of testthat tests across 63 test files. Each of these tests runs on Travis CI against an authenticated API, and the API fr...

3971 sym R (701 sym/4 pcs)

Adobe: Give Credit. You DID NOT Write RSiteCatalyst.

09.05.2016

As an author of several open-source software projects, I’ve taken for granted that people using the software share the same community values as I do. Open-source authors provide their code “free” to the community so that others may benefit without having to re-invent the wheel. The only expectation (but not an actual requirement per se), i...

3896 sym 6 img

Adobe Analytics Clickstream Data Feed: Calculations and Outlier Analysis

24.05.2016

In a previous post, I outlined how to load daily Adobe Analytics Clickstream data feeds into a PostgreSQL database. While this isn’t a long-term scalable solution for large e-commerce companies doing millions of page views per day, for exploratory analysis a relational database structure can work well until a more robust solution is put into pl...

2559 sym