Publications by Tony

Statistical Analysis with R, a Review

12.02.2011

[To all of the R-bloggers out there who recognize this, I apologize.  To those that don’t, This is at least the 5th review of this book to go on the feed.  The author is linking to the others here.] Long Version: I have a Bachelor’s degree in Computer Science.  I’m pretty handy when it comes to programming.  So when I look for a book ab...

4152 sym 2 img

Using R for Stata to CSV Conversion

03.06.2011

I recently found myself in the unpleasant situation of needing to read a Stata .dta file, but not having Stata readily available to me. Normally, I’d fire up a text editor and deconstruct the file, except Stata saves its data in a proprietary Binary format, meaning it garbles some of the content of the file.  Lucky thing the R foreign library ...

1006 sym

Analyzing the Failed States Index (with Polity IV)

07.07.2011

So, I decided to sit down and have a little fun with that Failed States Index data I put together. To start, I expect that the dataset will be pretty linearly correlated with the polity IV data. This makes sense–true democracies aren’t failed states, and failed states tend not to be democratic. To test this, I merged the two datasets for 2010...

2587 sym R (470 sym/5 pcs) 8 img

More fun with the Failed States Index (and the State Fragility Index)

09.07.2011

So the other day’s experiment with the Failed States Index and the Polity Data didn’t yield the linear trend I had originally expected.  After all, the two measure fundamentally distinct things.  But perhaps there’s another dataset which will match linearly.  The same people who made polity also put out a dataset called the State Fragili...

4650 sym R (2260 sym/8 pcs) 6 img

Measuring the EIU Democracy Index (with Polity IV)

12.07.2011

Yet again, I have conjured up an (academically) unusual dataset on democracy! This time it’s the Economist Intelligence Unit’s Democracy Index, a weird little gem.  The dataset is the basis for a paper the Economist publishes every two years.  Because of this biannuality, there is data estimating the “Democratic-ness” of the world’s c...

2001 sym R (1705 sym/4 pcs) 4 img

Musings on Correlation (or yet another reason I fear for those non-methodologically inclined students in my cohort)

12.08.2011

I’ve been thinking a lot about what it means for two variables to be correlated.  Scientists throw around the term like it’s uniformly understood, but I fear that an understanding of the concept is elusive to substantive researchers who aren’t interested in empirical methods, except as a means by which we can demonstrate that our hypothese...

6220 sym R (1940 sym/4 pcs) 6 img

Using wordcloud on search terms & phrases

28.03.2012

The wordcloud package for R is great, but all the examples I found used the tm package to process a large amount of textual data (web pages, text files, google docs, etc.)But what if you have normalized data where you have a word and its frequency? Or, what if you have phrases that you want in a wordcloud? One example being terms whic...

7857 sym 6 img

The Best Statistical Programming Language is …Javascript?

27.04.2012

R-Bloggers has recently been buzzing about Julia, the new kid on the statistical programming block.  Julia, however, is hardly the sole contender for the market of R defectors, with Clojure-fork Incanter generating buzz as well.  Even with these two making noise, I think there’s a huge point that everyone is missing, and it’s front-and-cent...

1204 sym 1 tbl

R solvements to Project Euler — problem 1

15.05.2012

Things have been going wild since I opened this blog. Tasks were piled up while I was tight on time. At present, I’m facing a major challenge in my life. However, I decide to spare some time for self-improvements. R is one of the most useful tool I’ve learned in my research life. Learning R has been always in my to-do list but my practice is ...

1882 sym R (263 sym/2 pcs) 1 tbl

Project Euler — problem 2

21.05.2012

Almost my time for bed. Just write a quick solution on the second problem of Project Euler. Here it is. Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be: 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, … By considering the terms in the Fibonacci sequence whose values...

1296 sym R (423 sym/2 pcs) 1 tbl