Publications by R on kieranhealy.org
Baby Name Animation
I was playing around with the gganimate package this morning and thought I’d make a little animation showing a favorite finding about the distribution of baby names in the United States. This is the fact—I think first noticed by Laura Wattenberg, of the Baby Name Voyager—that there has been a sharp, relatively recent rise in boys’ names e...
2693 sym R (2216 sym/4 pcs) 4 img
Earned Doctorates
PhDs awarded in selected disciplines, 2006-2016. Thierry Rossier asked me for the code to produce plots like the one above. The data come from the Survey of Earned Doctorates, a very useful resource for tracking trends in PhDs awarded in the United States. The plot is made with geom_line() and geom_label_repel(). The trick, if it can be dignifie...
1328 sym R (1724 sym/1 pcs) 2 img
Back in the GSSR
The General Social Survey, or GSS, is one of the cornerstones of American social science and one of the most-analyzed datasets in Sociology. It is routinely used in research, in teaching, and as a reference point in discussions about changes in American society since the early 1970s. It is also a model of open, public data. The National Opinion R...
2011 sym
Parsing Sda Pages
SDA is a suite of software developed at Berkeley for the web-based analysis of survey data. The Berkeley SDA archive (http://sda.berkeley.edu) lets you run various kinds of analyses on a number of public datasets, such as the General Social Survey. It also provides consistently-formatted HTML versions of the codebooks for the surveys it hosts. Th...
4100 sym R (7349 sym/8 pcs)
Widening Multiple Columns Redux
Last year I wrote about the slightly tedious business of spreading (or widening) multiple value columns in Tidyverse-flavored R. Recent updates to the tidyr package, particularly the introduction of the pivot_wider() and pivot_longer() functions, have made this rather more straightforward to do than before. Here I recapitulate the earlier example...
2396 sym R (2726 sym/5 pcs)
Reconstructing Images Using PCA
A decade or more ago I read a nice worked example from the political scientist Simon Jackman demonstrating how to do Principal Components Analysis. PCA is one of the basic techniques for reducing data with multiple dimensions to some much smaller subset that nevertheless represents or condenses the information we have in a useful way. In a PCA ap...
5751 sym R (4657 sym/11 pcs) 6 img
Dogs of New York
The other week I took a few publicly-available datasets that I use for teaching data visualization and bundled them up into an R package called nycdogs. The package has datasets on various aspects of dog ownership in New York City, and amongst other things you can draw maps with it at the zip code level. The package homepage has installation inst...
3533 sym 4 img
Reading in Data
Here’s a common situation: you have a folder full of similarly-formatted CSV or otherwise structured text files that you want to get into R quickly and easily. Reading data into R is one of those tasks that can be a real source of frustration for beginners, so I like collecting real-life examples of the many ways it’s become much easier. This...
5255 sym R (7483 sym/7 pcs)
Cleaning the Table
While I’m talking about getting data into R this weekend, here’s another quick example that came up in class this week. The mortality data in the previous example were nice and clean coming in the door. That’s usually not the case. Data can be and usually is messy in all kinds of ways. One of the most common, particularly in the case of sum...
3488 sym R (8878 sym/7 pcs)
Dataviz Workshop at RStudio::conf
Workshop materials are available here: https://rstd.io/conf20-dataviz Consider buying the book; it’s good: Data Visualization: A Practical Introduction / Buy on Amazon I was delighted to have the opportunity to teach a two-day workshop on Data Visualization using ggplot2 at this year’s rstudio::conf(2020) in January. It was my first time a...
5269 sym