Publications by is.R()
Dot-density maps with spsample()
Today’s example is a little odd, in that the code isn’t pretty and the example isn’t really something you’d actually produce in real life — but if you’ll overlook those oddities, you’ll find that the spsample() function, and the sp package generally, can be very useful. One of the problems with choropleth maps is the degree to which...
2153 sym 4 img
Evaluating term popularity with twitteR
I really wanted to put something together for this series on the twitteR package. Unfortunately, at the moment the number of interesting things than can be done with twitteR, as opposed to through API calls and RCurl, is limited. Regardless, I have Yet Another Invented Application to illustrate a pretty typical use-case for twitteR: grabbing Twee...
1165 sym 4 img
Handling missing data with Amelia
So, what if you have data, but some of the observations are missing? Many statistical techniques assume no missingness, so we might want to “fill in” or rectangularize our data, by replacing missing observations with plausible substitutes. There are many ways of going about this, but one of the most robust and accessible is through the Amelia...
1177 sym 4 img
"Economics-style" graphs with bezier() from Hmisc
So, I really think this one is pretty cool. We spend much of our time in R making graphs with data, but what if you have a theory that you’d like to express graphically? Something like what I’ll call “economics-style” graphs, illustrating, for example, the Solow growth model, a production–possibility frontier, or an indifference curve? ...
2013 sym 4 img
US State Maps using map_data()
Today’s short post will show how to make a simple map using map_data(). Let’s assume you have data in a CSV file that may look like this: Notice the lower case state names; they will make merging the data much easier. The variable of interest we’re going to plot is the relative incarceration rates by race (whites and blacks) across each of...
1076 sym 10 img
Multidimensional metric unfolding with SMACOF
SMACOF stands for “Scaling by MAjorizing a COmplicated Function,” and it is a multidimensional scaling algorithm for metric unfolding of, among other things, rectangular ratings matrices. One neat Political Science application of MDS is inferring ideology from survey thermometer ratings. The 2008 ANES featured 43 different thermometer stimul...
1196 sym 4 img
Fuzzy clustering with fanny()
This is kind of a fun example, and you might find the fuzzy clustering technique useful, as I have, for exploratory data analysis. In this Gist, I use the unparalleled breakfast dataset from the smacof package, derive dissimilarities from breakfast item preference correlations, and use those dissimilarities to cluster foods. Fuzzy clustering with...
1403 sym 4 img
Everything is a Network, featuring the sna package
We’ve gotten some requests, through the Ask us anything page, to do some plotting of networks. We may come back to this later, but today’s Gist shows how you can plot pretty much literally anything as a network. First, we go back to our well-worn folder of flag PNGs from GoSquared, and load data for each pixel of each flag. Then, we bina...
1365 sym 4 img
Text analysis made too easy with the tm package
Today’s Gist takes the CNN transcript of the Denver Presidential Debate, converts paragraphs into a document-term matrix, and does the absolute most basic form of text analysis: a raw word count. There are actually quite a few steps in this process, though it is made easier with reference to the tm vignette, but you would do well to update R, r...
1185 sym 4 img
Possibly slightly better text analysis with lme4
lme4 and its cousin arm are extremely useful for a huge variety of modeling applications (see Gelman and Hill’s book), but today we’re going to do something a little frivolous with them. Namely, we’re going to extend our Denver Debate analysis to include some sense of error. Instead of the term-frequency scatter plot seen in the previous po...
1285 sym 4 img