Publications by R on kieranhealy.org
gssr Update
The General Social Survey, or GSS, is one of the cornerstones of US public opinion research and one of the most-analyzed datasets in Sociology. My colleague Steve Vaisey aptly describes it as the Hubble Space Telescope of American social science. It is routinely used in research, in teaching, and as a reference point in discussions about changes in...
2307 sym 4 img
Flipbookr for Quarto
{{flipbookr}} is an R package written by Gina Reynolds. It’s very useful for teaching. It was developed for use with .Rmd files Xaringan and presently does not work with Quarto. I hacked-up a version of Flipbookr that does work with Quarto. Using it with Xaringan should be exactly the same as before. Right now it’s incomplete. I’ve just focus...
992 sym
The Naming of Stats
The Naming of Stats is a difficult matter, It isn’t just one of your holiday games; You may think at first I’m as mad as a hatter When I tell you, a stat must have THREE DIFFERENT NAMES. First of all are the names where usage is informal, Such as Median, Estimate, Average, or Range, Such as Variance, Quartile, or else Standard...
1900 sym
Assault Deaths in the OECD 1960-2020
While we’re redoing some classics, here is the time series of assault deaths in the United States and eighteen other OECD countries from 1960 to 2020. Assault deaths in the OECD, 1960-2020. Code and data are available on GitHub. Related To leave a comment for the author, please follow the link and comment on their blog: R on kieranhealy.org....
635 sym 2 img
Life Expectancy and Health Spending in the OECD
The visualization exercise of the day for class is a re-creation of a figure I first saw Lane Kenworthy make. It’s a connected scatterplot of total health spending in real terms and life expectancy of the population as a whole. The fact that real spending and expectancy tend to steadily increase for most countries in most years makes the year-to-...
964 sym 2 img
Reading Remote Data Files
Sometimes data arrives as a series of individual files each of which is organized in the same way—which is to say, each of which has the same variables, features, or columns. Imagine a series of tables reporting mandated information about every school in the state, or a hundred spreadsheets each with information about a different country, or thir...
7758 sym R (6003 sym/16 pcs) 4 img 8 tbl
Escaping the Malthusian Trap
The Broadberry et al GDP series has estimates of England’s real GDP and population from the year 1270 onwards. It’s available, along with a lot of other long-run data, from The Bank of England. Here’s an animation of the series. I sometimes use this as a scene-setter when teaching social theory. It’s great because, in addition to the basi...
1458 sym
Iterating on the GSS
Let’s say we’re working with the General Social Survey. We’re interested in repeatedly fitting some model each year to see how some predictor changes over time. For example, the GSS has a longstanding question named fefam, where respondents are asked to give their opinion on the following statement: It is much better for everyone involved ...
6907 sym R (13504 sym/48 pcs) 4 img 24 tbl
Indexing Iterations with set_names()
As mentioned last time, we often want to build up a data frame iteratively. The map() family of functions in purrr can help with this. Here I’ll show a handy pattern for keeping track of what you’ve added to the data frame you’re making. The map_dfr() function will take a vector, apply a function to each element of it, and then return the r...
2158 sym R (6028 sym/14 pcs) 2 img 7 tbl
Map and Nested Lists
On StackOverflow, a questioner with a bunch of data frames (already existing as objects in their environment) wanted to split each of them into two based on some threshold being met, or not, on a specific column. Every one of the data frames had this column in it. Their thought was that they’d write a loop, or use lapply after putting the data ...
5004 sym R (10079 sym/22 pcs) 11 tbl