Publications by hrbrmstr
Convert epub to Text for Processing in R
@RMHoge asked the following on Twitter: Hello #rstats hyve mind! Is there a package that reads epub into R? I can not find any, I now convert to text and parse the text but you sort of lose the structure of the text. Pinging @dataandme @hrbrmstr — Roel (@RMHoge) April 12, 2018 Here’s one way to do that which doesn’t rely on pandoc (pando...
1811 sym R (5531 sym/5 pcs) 4 img
Does Congress Really Care About Your Privacy?
I apologize up-front for using bad words in this post. Said bad words include “Facebook”, “Mark Zuckerberg” and many referrals to entities within the U.S. Government. Given the topic, it cannot be helped. I’ve also left the R tag on this despite only showing some ggplot2 plots and Markdown tables. See the end of the post for how to get ...
14860 sym 18 img 4 tbl
By Request: Retrieving Your Feedly “Saved for Later” Entries
@mkjcktzn asked if one can access Feedly “Saved for Later” items via the API. The answer is “Yes!”, and it builds off of that previous post. You’ll need to read it and get your authentication key (still no package ) before continuing. We’ll use most (I think “all”) of the code from the previous post, so let’s bring that over her...
1630 sym R (3756 sym/4 pcs) 2 img
Examining POTUS Executive Orders
This week’s edition of Data is Plural had two really fun data sets. One is serious fun (the first comprehensive data set on U.S. evictions, and the other I knew about but had forgotten: The Federal Register Executive Order (EO) data set(s). The EO data is also comprehensive as the summary JSON (or CSV) files have links to more metadata and even...
2891 sym R (5519 sym/3 pcs) 4 img
Painless ODBC + dplyr Connections to Amazon Athena and Apache Drill with R & odbc
I spent some time this morning upgrading the JDBC driver (and changing up some supporting code to account for changes to it) for my metis package which connects R up to Amazon Athena via RJDBC. I’m used to JDBC and have to deal with Java separately from R so I’m also comfortable with Java, JDBC and keeping R working with Java. I notified the ...
3918 sym R (3163 sym/2 pcs) 4 img
Seventeen Minutes From Tweet To Package
Earlier today, @noamross posted to Twitter: #rstats #lazyweb What's the R/httr/curl equivalent of curl -F “file=@somefile.html” https://t.co/abbugLz9ZW — Noam Ross (@noamross) May 3, 2018 The answer was a 1:1 “file upload” curl to httr translation: httr::POST( url = "https://file.io", encode = "multipart", body = list(file =...
2226 sym R (370 sym/2 pcs)
Wrangling Data Table Out Of the FBI 2017 IC3 Crime Report
The U.S. FBI Internet Crime Complaint Center was established in 2000 to receive complaints of Internet crime. They produce an annual report, just released 2017’s edition, and I need the data from it. Since I have to wrangle it out, I thought some folks might like to play long at home, especially since it turns out I had to use both tabulizer an...
2800 sym R (22796 sym/15 pcs) 14 img
‘LMX ot NOSJ!’ Interchanging Classic Data Formats With Single `blackmagic` Incantations
The D.C. Universe magic hero Zatanna used spells (i.e. incantations) to battle foes and said spells were just sentences said backwards, hence the mixed up jumble in the title. But, now I’m regretting not naming the package zatanna and reversing the function names to help ensure they’re only used deliberately & carefully. You’ll see why in a...
5432 sym R (6408 sym/5 pcs) 8 img
Create Code Metrics with cloc
The cloc Perl script (yes, Perl!) by Al Danial (https://github.com/AlDanial/cloc) has been one of the go-to tools for generating code metrics. Given a single file, directory tree, archive, or git repo, cloc can speedily give you metrics on the count of blank lines, comment lines, and physical lines of source code in a vast array of programming la...
2174 sym R (2930 sym/3 pcs) 8 img
The Power of Standards and Consistency
I’m going to (eventually) write a full post on the package I’m mentioning in this one : osqueryr. The TLDR on osqueryr is that it is an R DBI wrapper (that has just enough glue to also be plugged into dbplyr) for osquery. The TLDR on osquery is that it “exposes an operating system as a high-performance relational database. This design allow...
8162 sym R (3181 sym/3 pcs) 10 img