Publications by hrbrmstr

Two new Apache Drill UDFs for Processing UR[IL]s and Internet Domain Names

26.07.2018

Continuing the blog’s UDF theme of late, there are two new UDF kids in town: drill-url-tools for slicing & dicing URI/URLs (just going to use ‘URL’ from now on in the post) drill-domain-tools for slicing & dicing internet domain names (IDNs). Now, if you’re an Apache Drill fanatic, you’re likely thinking “Hey hrbrmstr: don’t you k...

2784 sym R (2075 sym/3 pcs) 6 img

ggplot “Doodling” with HIBP Breaches

29.07.2018

After reading this interesting analysis of “How Often Are Americans’ Accounts Breached?” by Gaurav Sood (which we need more of in cyber-land) I gave in to the impulse to do some gg-doodling with the “Have I Been Pwnd” JSON data he used. It’s just some basic data manipulation with some heavy ggplot2 styling customization, so no real ne...

1018 sym R (2271 sym/1 pcs)

Digging into mbox details: A tale of tm & reticulate

04.08.2018

I had to processes a bunch of emails for a $DAYJOB task this week and my “default setting” is to use R for pretty much everything (this should come as no surprise). Treating mail as data is not an uncommon task and many R packages exist that can reach out and grab mail from servers or work directly with local mail archives. Mbox’in off the ...

7073 sym R (13731 sym/12 pcs) 16 img

In-brief: splashr update + High Performance Scraping with splashr, furrr & TeamHG-Memex’s Aquarium

13.08.2018

The development version of splashr now support authenticated connections to Splash API instances. Just specify user and pass on the initial splashr::splash() call to use your scraping setup a bit more safely. For those not familiar with splashr and/or Splash: the latter is a lightweight alternative to tools like Selenium and the former is an R in...

2829 sym R (1838 sym/2 pcs) 2 img

Introducing ‘gepetto’ — a Splash-like REST API to Headless Chrome

23.08.2018

It’s been over a year since Headless Chrome was introduced and it has matured greatly over that time and has acquired a pretty large user base. The TLDR on it is that you can now use Chrome as you would any command-line interface (CLI) program and generate PDFs, images or render javascript-interpreted HTML by supplying some simple parameters. I...

6167 sym R (898 sym/4 pcs) 14 img

Friday #rstats twofer: Finding macOS 32-bit apps & Processing Data from System Commands

24.08.2018

Apple has run the death bell on 32-bit macOS apps and, if you’re running a recent macOS version on your Mac (which you should so you can get security updates) you likely see this alert from time-to-time: If you’re like me, you click through that and keep working but later ponder just how many of those apps you have. They are definitely going...

5245 sym R (3910 sym/7 pcs) 6 img

Simplifying World Tile Grid Creation with geom_wtg()

27.08.2018

Nowadays (I’ve seen that word used so much in journal articles lately that I could not resist using it) I’m using world tile grids more frequently as the need arises to convey the state of exposure of various services at a global (country) scale. Given that necessity fosters invention it seemed that having a ggplot2 geom for world tile grids ...

3198 sym R (2574 sym/3 pcs) 8 img

Driving Drill Dynamically with Docker and Updating Storage Configurations On-the-fly with sergeant

09.09.2018

The sergeant package has a minor update that adds REST API coverage for two “new” storage endpoints that make it possible to add, update and remove storage configurations on-the-fly without using the GUI or manually updating a config file. This is an especially handy feature when paired with Drill’s new, official Docker container since that...

2784 sym R (4070 sym/5 pcs) 4 img

The Evolution of Data Literacy at the U.S. Department of Energy + Finding Power Grid Cyber Attacks in a Data Haystack

12.09.2018

I was chatting with some cyber-mates at a recent event and the topic of cyber attacks on the U.S. power-grid came up (as it often does these days). The conversation was brief, but the topic made its way into active memory and resurfaced when I saw today’s Data Is Plural newsletter which noted that “Utility companies are required to report maj...

5936 sym R (12428 sym/6 pcs) 44 img

Access the Internet Archive Advanced Search/Scrape API with wayback (+ a links to a new vignette & pkgdown site)

17.09.2018

The wayback package has had an update to more efficiently retrieve mementos and added support for working with the Internet Archive’s advanced search+scrape API. Search/Scrape The search/scrape interface lets you examine the IA collections and download what you are after (programmatically). The main function is ia_scrape() but you can also pagi...

1624 sym R (2136 sym/1 pcs) 4 img