Publications by Drew Conway

Benford’s Law Tests for Wikileaks Data

01.08.2010

In my first post on the WL Afghanistan data I provided a very high-level view of the data, and found that it generally met expectations for frequency given its context and presumed data generating process. Next, I will look a bit deeper at this process and test if the observed frequencies of reports have properties consistent with a natural data...

3809 sym 4 img

Wikileaks Attack Data by Year and Type Projected on Afghanistan Regional Map

07.08.2010

Below is a visualization of the Wikileaks data produced in collaboration with Michael Dewar. This plot shows attacks in the data set by year and type, projected onto a map of Afghanistan with district boundaries. This visualization is certainly not perfect, i.e., some colors are difficult to discern, but it does provide added insight to the lev...

859 sym 2 img

Animated Heatmap of WikiLeaks Report Intensity in Afghanistan

17.08.2010

Visualisation of Activity in Afghanistan using the Wikileaks data from Mike Dewar on Vimeo. The latest visualization of the WikiLeaks data compiled by our group is an animation of the intensity of report observations in Afghanistan over the six year period in the WikiLeaks data. Team member Mike Dewar did the vast majority of work for this visu...

2798 sym

Leveraging the Wisdom of Crowds for Fantasy Football

23.08.2010

WARNING: This has nothing to do with national security, but is nonetheless awesome. This evening I will be participating in that great annual tradition which marks the transition from Summer to Fall: the fantasy football draft. A large part of having a successful fantasy football draft is being able to adjudicate the value of a player more accura...

3840 sym 2 img

In Search of Power-laws: WikiLeaks Edition

26.08.2010

Yesterday, a commenter reminded me of the very popular hobby among scientists of searching for power-law distributions in large event data. While the commonality of scale invariance in event data is quite well known—particularly with respect to conflict data—this has not prevented many researchers from seeking and finding these patterns in d...

5025 sym 18 img 2 tbl

Where People Share Links About NYC

27.10.2010

Last week I participated in bit.ly’s fourth hackabit hack-a-thon, which is a wonderful opportunity for NYC area hackers to get together, eat pizza, drink energy drinks, and stay up late hacking with some of the best data geeks around. I was lucky enough to saddle up next to Hilary Mason, bit.ly’s lead scientist, recently named one of New Yor...

2908 sym 6 img

Co-authorship Network of SSRN Conflict Studies eJournal

10.11.2010

As part of my on-going research simulating network structure using graph motifs I have been collecting novel data sets to test and benchmark the method. Since I am a political scientist studying conflict, it was suggested to me to collect a co-authorship network within this sub-discipline. Such a network is useful for several reasons; for examp...

4139 sym

My First R Package: infochimps

20.11.2010

I have finally taken the plunge and created my first R package! As frequent readers will know, I often sing the praises of infochimps, a startup out of Austin, TX attempting to be the world’s data clearinghouse. While infochimps is an excellent resource for data sets, they also provide their own set excellent data APIs, which provide informat...

1559 sym

Fun with infochimps: Animated Blog Post Hit Map

03.12.2010

In a few weeks I will be visiting Chicago, and JD Long—the organizer of the local R users group—has graciously invited me to give a presentation. Ostensibly, the presentation will be on my recently released infochimps package, so I thought it was a good time to start actually putting together some examples and documentation for the package. ...

2662 sym

Jeromy Anglim on Reproducible Research and R

06.12.2010

Jeromy Anglim, fellow social scientist and R aficionado from across the globe, gave a great talk to the Melbourne R Users Group last week on the joys of creating reproducible results. A subject near and dear to me, but not one that is given enough attention in research training. Jeromy discusses tools for generating reproducible results, best pr...

1103 sym