Publications by Thomas Wood (
Scraping Json
Loading a json object Often when you’re interested in some super complicated data presentation online, and converting the underlying data to a nice table, there’s a super elegant way to proceed lurking underneath the site. Javascript is the way super rich maps and graphs are built online When you imagine: a cool map like this from the Time...
2868 sym R (1128 sym/3 pcs) 5 img
Document
Web scraping Web scraping is the process of programmaticaly loading a large number of static html pages inside R, and turning tables or text or some other data on those static pages into tabular data we can use for stats. Scraping is a legal grey area. Many sites expressly forbid the practice–while the owners of a data set don’t mind using t...
6344 sym R (5315 sym/15 pcs) 9 img
Predicting Box Office Multiples
When does a movie have box office ‘legs’? Perhaps the most famous dictum in Hollywood production comes from the dean of American screenwriters, two time Oscar winner William Goldman: No one knows anything Not one person in the entire motion picture field knows for a certainty what’s going to work. Every time out it’s a guess—and, if ...
3821 sym R (159 sym/1 pcs) 2 img 1 tbl
Dates and Lubridate
dates and lubridate A couple of simple exercises to reprise the lubridate toolkit. We’ll also be using our familiar tidy tools–specifically, dplyr and purrr. First–excess mortality is a concept public health uses to measure the net effect of a pandemic/war some exogenous shock to mortality data, after adjusting for the regular pattern of ...
973 sym R (445 sym/2 pcs)
Succession plans for post estimation tools
Should we abandon emmeans for marginaleffects? A couple of labs ago I provided you the memorable while still deeply pedagogical aphorism– you estimated an lm, but you probably want to tidy and plot an emmeans. I remain certain that, in most cases, the contrasts and pvalues from an object which reports combinations of coefficients is of more...
4221 sym R (9445 sym/21 pcs) 2 img
Dplyr Reprise
dplyr–a brief reprise I feel we’ve learned a lot of dplyr, and hopefully you’ve been persuaded that this is a good toolkit for the kinds of tasks we face as bench social scientists–data manipulation before modelling. But it’s a little like learning Italian–the lab is just providing vocabulary lists and some exemplary turns of phrase...
955 sym R (150 sym/1 pcs) 1 tbl
Code Lab 10 (functional programming ii)
Functional Programming II Let’s reprise some of the purrr toolkit. It’s a very general way to solve the problems we meet in applied statistics. First, our friend map library(tidyverse) library(magrittr) library(palmerpenguins) map( .x = c(2, 4, 8, 16, 32), .f = function(i){ i ^ 2 } ) ## [[1]] ## [1] 4 ## ## [[2]] ## [1]...
4422 sym R (7055 sym/26 pcs)
Document
Graph Clinic First, we’ll download the data, helpfully saved as a static `rds`` file library(tidyverse) library(magrittr) library(showtext) library(ggtext) library(httr) font_add_google("Roboto") t2 <- "https://github.com/thomasjwood/ps7160/raw/master/gpss/gpss_honesty_76_23.rds" %>% url %>% readRDS We need to turn some of the e...
1750 sym R (8827 sym/12 pcs) 5 img
Document
A single figure We’ll try something different this time–going methodically through a single figure, from scratch. I’ll try to move at a pace which allows us to discuss the details. In this lab, we’ll use the 2022 version of the Chicago Council Survey, which has measured foreign policy attitudes since 1974. First, we’ll download the da...
5824 sym R (20400 sym/37 pcs) 8 img
Graphics Lab I Exercise Anwers
Answers to Graphics Lab exercises 1. Take the top grossing film released in every year between 2018 and 2023. For each film, plot the cumulative adjusted box office against days in release. We’ll start by loading the data library(tidyverse) library(magrittr) t1 <- "https://github.com/thomasjwood/code_lab/raw/main/data/box_office_jan_97_feb_...
1300 sym R (4776 sym/7 pcs) 3 img 1 tbl