Publications by R on kieranhealy.org

gssr Update

02.12.2023

The General Social Survey, or GSS, is one of the cornerstones of US public opinion research and one of the most-analyzed datasets in Sociology. My colleague Steve Vaisey aptly describes it as the Hubble Space Telescope of American social science. It is routinely used in research, in teaching, and as a reference point in discussions about changes in...

2307 sym 4 img

Flipbookr for Quarto

10.08.2023

{{flipbookr}} is an R package written by Gina Reynolds. It’s very useful for teaching. It was developed for use with .Rmd files Xaringan and presently does not work with Quarto. I hacked-up a version of Flipbookr that does work with Quarto. Using it with Xaringan should be exactly the same as before. Right now it’s incomplete. I’ve just focus...

992 sym

The Naming of Stats

19.06.2023

The Naming of Stats is a difficult matter,      It isn’t just one of your holiday games; You may think at first I’m as mad as a hatter When I tell you, a stat must have THREE DIFFERENT NAMES. First of all are the names where usage is informal,      Such as Median, Estimate, Average, or Range, Such as Variance, Quartile, or else Standard...

1900 sym

Assault Deaths in the OECD 1960-2020

30.03.2023

While we’re redoing some classics, here is the time series of assault deaths in the United States and eighteen other OECD countries from 1960 to 2020. Assault deaths in the OECD, 1960-2020. Code and data are available on GitHub. Related To leave a comment for the author, please follow the link and comment on their blog: R on kieranhealy.org....

635 sym 2 img

Life Expectancy and Health Spending in the OECD

29.03.2023

The visualization exercise of the day for class is a re-creation of a figure I first saw Lane Kenworthy make. It’s a connected scatterplot of total health spending in real terms and life expectancy of the population as a whole. The fact that real spending and expectancy tend to steadily increase for most countries in most years makes the year-to-...

964 sym 2 img

Reading Remote Data Files

25.03.2023

Sometimes data arrives as a series of individual files each of which is organized in the same way—which is to say, each of which has the same variables, features, or columns. Imagine a series of tables reporting mandated information about every school in the state, or a hundred spreadsheets each with information about a different country, or thir...

7758 sym R (6003 sym/16 pcs) 4 img 8 tbl

Escaping the Malthusian Trap

08.01.2023

The Broadberry et al GDP series has estimates of England’s real GDP and population from the year 1270 onwards. It’s available, along with a lot of other long-run data, from The Bank of England. Here’s an animation of the series. I sometimes use this as a scene-setter when teaching social theory. It’s great because, in addition to the basi...

1458 sym

Iterating on the GSS

08.04.2022

Let’s say we’re working with the General Social Survey. We’re interested in repeatedly fitting some model each year to see how some predictor changes over time. For example, the GSS has a longstanding question named fefam, where respondents are asked to give their opinion on the following statement: It is much better for everyone involved ...

6907 sym R (13504 sym/48 pcs) 4 img 24 tbl

Indexing Iterations with set_names()

10.04.2022

As mentioned last time, we often want to build up a data frame iteratively. The map() family of functions in purrr can help with this. Here I’ll show a handy pattern for keeping track of what you’ve added to the data frame you’re making. The map_dfr() function will take a vector, apply a function to each element of it, and then return the r...

2158 sym R (6028 sym/14 pcs) 2 img 7 tbl

Map and Nested Lists

27.04.2022

On StackOverflow, a questioner with a bunch of data frames (already existing as objects in their environment) wanted to split each of them into two based on some threshold being met, or not, on a specific column. Every one of the data frames had this column in it. Their thought was that they’d write a loop, or use lapply after putting the data ...

5004 sym R (10079 sym/22 pcs) 11 tbl