Publications by Drew Conway
Benford’s Law Tests for Wikileaks Data
In my first post on the WL Afghanistan data I provided a very high-level view of the data, and found that it generally met expectations for frequency given its context and presumed data generating process. Next, I will look a bit deeper at this process and test if the observed frequencies of reports have properties consistent with a natural data...
3809 sym 4 img
Wikileaks Attack Data by Year and Type Projected on Afghanistan Regional Map
Below is a visualization of the Wikileaks data produced in collaboration with Michael Dewar. This plot shows attacks in the data set by year and type, projected onto a map of Afghanistan with district boundaries. This visualization is certainly not perfect, i.e., some colors are difficult to discern, but it does provide added insight to the lev...
859 sym 2 img
Animated Heatmap of WikiLeaks Report Intensity in Afghanistan
Visualisation of Activity in Afghanistan using the Wikileaks data from Mike Dewar on Vimeo. The latest visualization of the WikiLeaks data compiled by our group is an animation of the intensity of report observations in Afghanistan over the six year period in the WikiLeaks data. Team member Mike Dewar did the vast majority of work for this visu...
2798 sym
Leveraging the Wisdom of Crowds for Fantasy Football
WARNING: This has nothing to do with national security, but is nonetheless awesome. This evening I will be participating in that great annual tradition which marks the transition from Summer to Fall: the fantasy football draft. A large part of having a successful fantasy football draft is being able to adjudicate the value of a player more accura...
3840 sym 2 img
In Search of Power-laws: WikiLeaks Edition
Yesterday, a commenter reminded me of the very popular hobby among scientists of searching for power-law distributions in large event data. While the commonality of scale invariance in event data is quite well known—particularly with respect to conflict data—this has not prevented many researchers from seeking and finding these patterns in d...
5025 sym 18 img 2 tbl
Where People Share Links About NYC
Last week I participated in bit.ly’s fourth hackabit hack-a-thon, which is a wonderful opportunity for NYC area hackers to get together, eat pizza, drink energy drinks, and stay up late hacking with some of the best data geeks around. I was lucky enough to saddle up next to Hilary Mason, bit.ly’s lead scientist, recently named one of New Yor...
2908 sym 6 img
Co-authorship Network of SSRN Conflict Studies eJournal
As part of my on-going research simulating network structure using graph motifs I have been collecting novel data sets to test and benchmark the method. Since I am a political scientist studying conflict, it was suggested to me to collect a co-authorship network within this sub-discipline. Such a network is useful for several reasons; for examp...
4139 sym
My First R Package: infochimps
I have finally taken the plunge and created my first R package! As frequent readers will know, I often sing the praises of infochimps, a startup out of Austin, TX attempting to be the world’s data clearinghouse. While infochimps is an excellent resource for data sets, they also provide their own set excellent data APIs, which provide informat...
1559 sym
Fun with infochimps: Animated Blog Post Hit Map
In a few weeks I will be visiting Chicago, and JD Long—the organizer of the local R users group—has graciously invited me to give a presentation. Ostensibly, the presentation will be on my recently released infochimps package, so I thought it was a good time to start actually putting together some examples and documentation for the package. ...
2662 sym
Jeromy Anglim on Reproducible Research and R
Jeromy Anglim, fellow social scientist and R aficionado from across the globe, gave a great talk to the Melbourne R Users Group last week on the joys of creating reproducible results. A subject near and dear to me, but not one that is given enough attention in research training. Jeromy discusses tools for generating reproducible results, best pr...
1103 sym