Publications by Randy Zwitch
A Million Text Files And A Single Laptop
Wait…What? Why?More often that I would like, I receive datasets where the data has only been partially cleaned, such as the picture on the right: hundreds, thousands…even millions of tiny files. Usually when this happens, the data all have the same format (such as having being generated by sensors or other memory-constrained devices).The pro...
7127 sym R (199 sym/3 pcs) 2 img
RSiteCatalyst Version 1.4.7 (and 1.4.6.) Release Notes
It seems as though I missed release notes for version RSiteCatalyst 1.4.6, so we’ll do those and RSiteCatalyst 1.4.7 (now on CRAN) and the same time…RSiteCatalyst 1.4.6This release was mostly tweaking some settings, specifically:Adding a second `top` argument within the Queue* functions for more control on results returned. It used to be the ...
3365 sym
Calling RSiteCatalyst From Python
@randyzwitch Do you know if anyone has gotten RSiteCat running in a Jupyter Notebook that ran RPY2? Tired of using 2 different environments— Adam Gitzes (@FootballActuary) February 18, 2016This will be a very short post, because the only “new” information I’m going to provide is the minimal example to answer the question. Yes, it is in f...
1969 sym 2 img
Adobe Analytics Clickstream Data Feed: Loading To Relational Database
In my previous post about the Adobe Analytics Clickstream Data Feed, I showed how it was possible to take a single day worth of data and build a dataframe in R. However, most likely your analysis will require using multiple days/weeks/months of data, and given the size and complexity of the feed, loading the files into a relational database makes...
4405 sym
RSiteCatalyst Version 1.4.8 Release Notes
For being in RSiteCatalyst retirement, I’m ending up working on more functionality lately ¯_(ツ)_/¯. Here are the changes for RSiteCatalyst 1.4.8, which should be available on CRAN shortly:Segment StackingRSiteCatalyst now has the ability to take multiple values in the segment.id keyword for the Queue* functions. This functionality was gr...
2801 sym 4 img
Travis CI: “You Have Too Many Tests LOLZ!”
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.As part of getting RSiteCatalyst 1.4.8 ready for CRAN, I’ve managed to accumulate hundreds of testthat tests across 63 test files. Each of these tests runs on Travis CI against an authenticated API, and the API fr...
3971 sym R (701 sym/4 pcs)
Adobe: Give Credit. You DID NOT Write RSiteCatalyst.
As an author of several open-source software projects, I’ve taken for granted that people using the software share the same community values as I do. Open-source authors provide their code “free” to the community so that others may benefit without having to re-invent the wheel. The only expectation (but not an actual requirement per se), i...
3896 sym 6 img
Adobe Analytics Clickstream Data Feed: Calculations and Outlier Analysis
In a previous post, I outlined how to load daily Adobe Analytics Clickstream data feeds into a PostgreSQL database. While this isn’t a long-term scalable solution for large e-commerce companies doing millions of page views per day, for exploratory analysis a relational database structure can work well until a more robust solution is put into pl...
2559 sym