Publications by hrbrmstr
Identify & Analyze Web Site Tech Stacks With rappalyzer
Modern websites are complex beasts. They house photo galleries, interactive visualizations, web fonts, analytics code and other diverse types of content. Despite the potential for diversity, many web sites share similar “tech stacks” — the components that come together to make them what they are. These stacks consist of web servers (often w...
9363 sym R (6042 sym/11 pcs) 2 img
Retrieve & process TV News chyrons with newsflash
The Internet Archive recently announced a new service they’ve dubbed ‘Third Eye’. This service scrapes the chyrons that annoyingly scroll across the bottom-third of TV news broadcasts. IA has a vast historical archive of TV news that they’ll eventually process, but — for now — the more recent broadcasts from four channels are readily ...
3609 sym R (9983 sym/6 pcs) 2 img
Enabling Concerned Visitors & Ethical Security Researchers with security.txt Web Security Policies (plus analyze them at-scale with R)
I’ve blogged a bit about robots.txt — the rules file that documents a sites “robots exclusion” standard that instructs web crawlers what they can and cannot do (and how frequently they should do things when they are allowed to). This is a well-known and well-defined standard, but it’s not mandatory and often ignored by crawlers and cont...
3915 sym R (1003 sym/4 pcs)
A Call to Tweets (& Blog Posts)!
Way back in July of 2009, the first version of the twitteR package was published by Geoff Jentry in CRAN. Since then it has seen 28 updates, finally breaking the 0.x.y barrier into 1.x.y territory in March of 2013 and receiving it’s last update in July of 2015. For a very long time, the twitteR package was the way to siphon precious nuggets of ...
4137 sym R (7310 sym/2 pcs) 6 img
gg_tweet’ing Power Outages
As many folks know, I live in semi-rural Maine and we were hit pretty hard with a wind+rain storm Sunday to Monday. The hrbrmstr compound had no power (besides a generator) and no stable/high-bandwidth internet (Verizon LTE was heavily congested) since 0500 Monday and still does not as I write this post. I’ve played with scraping power outage d...
5353 sym R (3171 sym/6 pcs)
Yet-Another-Power Outages Post : Full Tidyverse Edition
This past weekend, violent windstorms raged through New England. We — along with over 500,000 other Mainers — went “dark” in the wee hours of Monday morning and (this post was published on Thursday AM) we still have no utility-provided power nor high-speed internet access. The children have turned iFeral, and being a remote worker has bee...
3586 sym R (5392 sym/5 pcs) 4 img
I, For One, Welcome Our Forthcoming New robots.txt Overlords
Despite my week-long Twitter consumption sabbatical (helped — in part — by the nigh week-long internet and power outage here in Maine), I still catch useful snippets from folks. My cow-orker @dabdine shunted a tweet by @terrencehart into a Slack channel this morning, and said tweet contained a link to this little gem. Said gem is the text of ...
6092 sym 2 img
Taking a Shot at cdcfluview v0.7.0 (a.k.a. The Dangers of Relying on ‘Hidden’ APIs)
Unlike @noamross, I am not an epidemiologist (NOTE: Noam battles pandemics before breakfast, so be super nice to him) but I do like to find kindred methodologies in other disciplines to help foster the growth of cybersecurity into something beyond it’s current Barnum & Bailey state. I also love finding and exposing hidden APIs and especially en...
6171 sym 12 img
Measuring & Monitoring Internet Speed with R
Working remotely has many benefits, but if you work remotely in an area like, say, rural Maine, one of those benefits is not massively speedy internet connections. Being able to go fast and furious on the internet is one of the many things I miss about our time in Seattle and it is unlikely that we’ll be seeing Google Fiber in my small town any...
5409 sym R (5259 sym/4 pcs) 6 img
Twitter Outer Limits : Seeing How Far Have Folks Fallen Down The Slippery Slope to “280” with rtweet
By now, virtually every major media outlet has covered the “280 Apocalypse”™. For those still not “in the know”, Twitter recently moved the tweet character cap to 280 after a “successful” beta test (some of us have different ideas of what “success” looks like). I had been on a hiatus from the platform for a while and planned to ...
2493 sym R (2254 sym/1 pcs) 2 img