Publications by R on kieranhealy.org
Indexing Iterations with set_names()
As mentioned last time, we often want to build up a data frame iteratively. The map() family of functions in purrr can help with this. Here I’ll show a handy pattern for keeping track of what you’ve added to the data frame you’re making. The map_dfr() function will take a vector, apply a function to each element of it, and then return the r...
2158 sym R (6028 sym/14 pcs) 2 img 7 tbl
Map and Nested Lists
On StackOverflow, a questioner with a bunch of data frames (already existing as objects in their environment) wanted to split each of them into two based on some threshold being met, or not, on a specific column. Every one of the data frames had this column in it. Their thought was that they’d write a loop, or use lapply after putting the data ...
5004 sym R (10079 sym/22 pcs) 11 tbl
Unhappy in its Own Way
“Happy families are all alike; every unhappy family is unhappy in its own way” runs the opening sentence of Anna Karenina. Hadley Wickham echoes the sentiment in a somewhat different context: “Tidy datasets are all alike, but every messy dataset is messy in its own way”. Data analysis is mostly data wrangling. That is, before you can do a...
19471 sym R (21138 sym/64 pcs) 32 tbl
Filling Ordered Facets From the Bottom Row
On Twitter the other day, Philip Cohen put up some data on changes in Bachelor’s degrees awarded between 1995 and 2015. The data come from the National Center for Education Statistics. It seemed like a good candidate for drawing as a figure, so I had a go at it: Changes in the number of Bachelor’s degrees awarded over the past twenty years. ...
4123 sym R (3656 sym/5 pcs) 10 img
Us Monthly Births
Yesterday I came across Aaron Penne’s collection of very nice data visualizations, one of which was of monthly births in the United States since 1933. He made a tiled heatmap of the data, taking care when calculating the average rate to correct for the varying number of days in different months. Aaron works in Python, so I took the opportunity ...
2239 sym 8 img
Animated Population Pyramids in R
Amateur demography week continues around here. Today we are looking at the population of England and Wales since 1961, courtesy of some data from the UK Office of National Statistics. We have data on population counts by age (in nice, detailed, yearly increments) broken down by sex. We’re going to tidy the data, make a pyramid for a year, and t...
3933 sym R (6382 sym/7 pcs) 6 img
Visualizing the Baby Boom
To close out what has become demography week, I combined the US monthly birth data with data for England and Wales (from the same ONS source as before), so that I could look at the trends together. The monthly England and Wales data I have to hand runs from 1938 to 1991. I thought combining the monthly tiled heatmap and the LOESS decomposition wo...
2864 sym 2 img
Spreading Multiple Values
Earlier this year my colleague Steve Vaisey was converting code in some course notes from Stata to R. He asked me a question about tidily converting from long to wide format when you have multiple value columns. This is a little more awkward than it should be, and I’ve run into the issue several times since then. I’m writing down the answer (...
3701 sym R (6562 sym/8 pcs)
Congress Over Time
Since the U.S. midterm elections I’ve been playing around with some Congressional Quarterly data about the composition of the House and Senate since 1945. Unfortunately I’m not allowed to share the data, but here are two or three things I had to do with it that you might find useful. The data comes as a set of CSV files, one for each congress...
4818 sym R (16479 sym/12 pcs) 8 img
Zero Counts in dplyr
Here’s a feature of dplyr that occasionally bites me (most recently while making these graphs). It’s about to change mostly for the better, but is also likely to bite me again in the future. If you want to follow along there’s a GitHub repo with the necessary code and data. Say we have a data frame or tibble and we want to get a frequency t...
5445 sym R (5949 sym/9 pcs) 8 img