Publications by is.R()

Dot-density maps with spsample()

07.12.2012

Today’s example is a little odd, in that the code isn’t pretty and the example isn’t really something you’d actually produce in real life — but if you’ll overlook those oddities, you’ll find that the spsample() function, and the sp package generally, can be very useful. One of the problems with choropleth maps is the degree to which...

2153 sym 4 img

Evaluating term popularity with twitteR

08.12.2012

I really wanted to put something together for this series on the twitteR package. Unfortunately, at the moment the number of interesting things than can be done with twitteR, as opposed to through API calls and RCurl, is limited. Regardless, I have Yet Another Invented Application to illustrate a pretty typical use-case for twitteR: grabbing Twee...

1165 sym 4 img

Handling missing data with Amelia

09.12.2012

So, what if you have data, but some of the observations are missing? Many statistical techniques assume no missingness, so we might want to “fill in” or rectangularize our data, by replacing missing observations with plausible substitutes. There are many ways of going about this, but one of the most robust and accessible is through the Amelia...

1177 sym 4 img

"Economics-style" graphs with bezier() from Hmisc

10.12.2012

So, I really think this one is pretty cool. We spend much of our time in R making graphs with data, but what if you have a theory that you’d like to express graphically? Something like what I’ll call “economics-style” graphs, illustrating, for example, the Solow growth model, a production–possibility frontier, or an indifference curve? ...

2013 sym 4 img

US State Maps using map_data()

11.12.2012

Today’s short post will show how to make a simple map using map_data(). Let’s assume you have data in a CSV file that may look like this: Notice the lower case state names; they will make merging the data much easier. The variable of interest we’re going to plot is the relative incarceration rates by race (whites and blacks) across each of...

1076 sym 10 img

Multidimensional metric unfolding with SMACOF

12.12.2012

SMACOF stands for “Scaling by MAjorizing a COmplicated Function,” and it is a multidimensional scaling algorithm for metric unfolding of, among other things, rectangular ratings matrices. One neat Political Science application of MDS is inferring ideology from survey thermometer ratings. The 2008 ANES featured 43 different thermometer stimul...

1196 sym 4 img

Fuzzy clustering with fanny()

13.12.2012

This is kind of a fun example, and you might find the fuzzy clustering technique useful, as I have, for exploratory data analysis. In this Gist, I use the unparalleled breakfast dataset from the smacof package, derive dissimilarities from breakfast item preference correlations, and use those dissimilarities to cluster foods. Fuzzy clustering with...

1403 sym 4 img

Everything is a Network, featuring the sna package

14.12.2012

We’ve gotten some requests, through the Ask us anything page, to do some plotting of networks. We may come back to this later, but today’s Gist shows how you can plot pretty much literally anything as a network. First, we go back to our well-worn folder of flag PNGs from GoSquared, and load data for each pixel of each flag. Then, we bina...

1365 sym 4 img

Text analysis made too easy with the tm package

15.12.2012

Today’s Gist takes the CNN transcript of the Denver Presidential Debate, converts paragraphs into a document-term matrix, and does the absolute most basic form of text analysis: a raw word count. There are actually quite a few steps in this process, though it is made easier with reference to the tm vignette, but you would do well to update R, r...

1185 sym 4 img

Possibly slightly better text analysis with lme4

16.12.2012

lme4 and its cousin arm are extremely useful for a huge variety of modeling applications (see Gelman and Hill’s book), but today we’re going to do something a little frivolous with them. Namely, we’re going to extend our Denver Debate analysis to include some sense of error. Instead of the term-frequency scatter plot seen in the previous po...

1285 sym 4 img