Publications by Jason Timm
twitter users, demographic inference & reticulate
A simple code-through for using the Python library m3inference in R via reticulate. As described in Wang et al. (2019): Demographic Inference and Representative Population Estimates from Multilingual Social Media Data. Library facilitates demographic attribute inference of Twitter users, namely, gender, age, and organizational status, based on pr...
3325 sym R (4562 sym/16 pcs) 4 img 2 tbl
a census-based approach to spanish language maintenance
Census nuts/bolts New Mexico & the US Some macro-exploration A simple model Some final notes References In this post we investigate Spanish language maintenance within Hispanic communities in the US utilizing data from the US Census. Spanish language maintenance refers to the rate at which Hispanics within a given community speak Spanish. Here, ...
10146 sym R (6869 sym/16 pcs) 4 img
corpus query and grammatical constructions
Search syntax Corpus search Search summary KWIC & BOW Summary and shiny This post demonstrates the use of a simple collection of functions from my R-package corpuslingr. Functions streamline two sets of corpus linguistics tasks: annotated corpus search of grammatical constructions and complex lexical patterns in context, and detailed summary an...
5576 sym R (3211 sym/14 pcs) 6 img
a simple framework for corpus-based keyphrase extraction
Defining potential keyphrases Corpus search for potential keyphrases Selecting descriptive keyphrases with the tf-idf statisitic Post script – State of the Union Addresses This post outlines a simple framework for identifying and extracting keyphrases from component texts of a corpus. We first consider some functional characteristics of descri...
4547 sym R (1977 sym/9 pcs) 2 img
locating linguistic diversity in the usa
Language data and the census Languages in the US Linguistic diversity as entropy Locating linguistic diversity FIN This post investigates linguistic diversity in the United States utilizing data made available by the US Census. We consider census language classifications, and introduce a simple methodology for quantifying linguistic diversity us...
5422 sym R (5325 sym/12 pcs) 6 img
topic models for synchronic & diachronic corpus exploration
Synchronic application Diachronic application Topic clusters quick summary References This post outlines a fairly simple workflow from annotated corpus to topic model, with a focus on the exploratory utility of topic models. We first consider some text structures relevant to topic modeling in R, and then demonstrate some approaches to visualizin...
6977 sym R (7004 sym/19 pcs) 12 img
place from text: geography & distributional semantics
From text to map Corpus search and context LSA, MDS, and semantic space FIN In this post, we demonstrate some different methodologies for exploring the geographical information found in text. First, we address some of the practical issues of extracting places/place-names from an annotated corpus, and demonstrate how to (1) map their geospatial d...
7053 sym R (3178 sym/14 pcs) 4 img 3 tbl
building historical socio-demographic profiles
Some preliminaries Socio-economic profiles Age distribution profiles Summary This post demonstrates a simple workflow for building census-based, historical socio-demographic profiles using the R package tidycensus. The goal is to outline a reproducible method for quick visual exploration of trend data made available via the American Community Su...
6376 sym R (8954 sym/20 pcs) 8 img
psychological and geographical distance in text
Concreteness ratings and the lexvarsdatr package Context & concreteness scores Geographical distance FIN References This post considers a super-clever study presented in Snefjella and Kuperman (2015), in which the authors investigate the relationship between psychological distance and geographical distance using geolocated tweets. General idea/h...
7750 sym R (2985 sym/14 pcs) 8 img 1 tbl
New Mexico’s 53rd State Legislature
Package descriptives NMSL53: an overview Attendance & party loyalty Health care-related roll calls Roll call details Incorporating census data Summary Postscript: Vizualizing congressional composition In this post, we introduce a new R data package, nmlegisdatr, that makes available roll call data for New Mexico’s 53rd (2017-18) State Legislat...
7753 sym R (5807 sym/17 pcs) 8 img 10 tbl