Publications by Matt

One Line to Get and Print All Loaded Library Packages in R

24.09.2017

This function is really only useful for those of us who try to rapidly develop a script to give our output, whether it is a plot in ggplot, CSV file, or an html widget. Today I was working on a script which utilized the leaflet R package among some other libraries which was slowly devolving into something less maintainable than I prefer. Anyhow, ...

1113 sym R (714 sym/3 pcs)

Trouble Upgrading to R 3.4 on Debian

09.10.2017

Recently had a very frustrating error on Proxmox (Debian) while trying to upgrade to R 3.4.2The following packages have unmet dependencies: r-base-core : Depends: libjpeg8 (>= 8c) but it is not installable Depends: libpng12-0 (>= 1.2.13-4) but it is not installable Depends: libreadline6 (>= 6.0) but it is not ins...

742 sym R (1093 sym/6 pcs)

Importing Large NDJSON Files into R

08.02.2018

I ran into this problem recently when trying to import the data my twitter scraper produced and thought this might make a worthwhile post. The file I was trying to import was ~30GB, which is absolutely monsterous. This was in part do to all of the fields I didn’t bother dropping before writing them to my data.json file. The Process The first th...

1968 sym R (1193 sym/4 pcs) 2 img

Machine Learning as a Service

17.04.2018

What is Machine Learning as a Service? With all of this news coming out about Cambridge Analytica (and how they have leveraged/weaponized data science for political purposes on a massive scale) I thought now was a good time to talk about how I see machine learning branching out into the mainstream behind the scenes. MLaaS Providers I fully expect...

5433 sym 4 img

Scraping Tables from Wikipedia for Visualizing Climate Data

25.09.2018

If anyone else is like me, eventually when looking up a future destination you will stumble across the climate data table on Wikipedia. There is a lot of great information, but if you are planning a trip you might just want to see at a glance the temperature ranges for the months you are interested in traveling. This script should help you scrape...

1821 sym R (1608 sym/7 pcs) 2 img

MySQL Data Type Mapping in R

02.10.2018

There was a recent question in the /r/Rlanguage subreddit which piqued my interest. They asked how to find the right mapping, and with the large number of data types I wondered if there was a good way to dynamically discover how fields are cast. First step is to decide how to communicate with the database. I used the package RMySQL for this. libr...

1510 sym R (1130 sym/9 pcs)

Create 3D County Maps Using Density as Z-Axis

29.11.2018

This is going to be a bit longer than some of my previous tutorials as it covers a walkthrough for sourcing data, scraping tables, cleaning, and generating the 3D view below which you can springboard from with the help of the rgl package. The heavy lifting is done with ggplot and rayshader. Rayshader rayshader is an open source R package for ...

4320 sym R (4956 sym/23 pcs) 12 img

Visualizing Bike Share Data (NiceRide)

01.03.2019

This tutorial will cover exploring and visualizing data through 2018 for the Minneapolis, MN bike sharing service NiceRide. Part of what makes R incredible is the number of great packages. Part of what makes packages like ggmap and gganimate great is how they build on existing packages. First step, as always, is to include the libraries we will ...

8159 sym R (10777 sym/26 pcs) 12 img

Bigram Analysis of Democratic Debates

30.08.2019

This tutorial will mainly focus on ggplot and bigrams, but it does gloss over clustering for a heatmap. This project started a while back, tweeting the plots at the beginning of this month. Life happens I suppose. Bought a new bike, had a birthday, yaddayadda. Better late then never? I want to preface this with the disclaimer that a phrase repea...

3862 sym R (5583 sym/15 pcs) 6 img

Split Intermixed Names into First, Middle, and Last

21.10.2019

Data cleaning can be a challenge, so I hope this helps the process for someone out there. This is a tiny, but valuable function for those who deal with data collected from non-ideal forms. As nearly always, this depends on the tidyverse library. You may want to rename the function from fml, but it does best describe dealing with mangled data. Thi...

1763 sym R (1804 sym/4 pcs) 3 tbl