Publications by hrbrmstr

ThinkStats … in R :: Example 1.3

07.03.2012

With 1.2 under our belts, we go now to the example in section 1.3 which was designed to show us how to partition a larger set of data into subsets for analysis. In this case, we’re going to jump to example 1.3.2 to determine the number of live births. While the Python loop is easy to write, the R code is even easier: 1 livebirths <- subset(preg...

1917 sym R (2586 sym/8 pcs) 4 tbl

ThinkStats … in R :: Example/Chapter 2 :: Example 2.1-2.3

14.03.2012

As promised, this post is a bit more graphical, but I feel the need to stress the importance of the first few points in chapter 2 of the book (i.e. the difference between mean and average and why variance is meaningful). These are fundamental concepts for future work. The “pumpkin” example (2.1) gives us an opportunity to do some very basic R...

3780 sym R (1243 sym/16 pcs) 8 img 8 tbl

DIY ZeroAccess GeoIP Plots

05.10.2012

Since F-Secure was #spiffy enough to provide us with GeoIP data for mapping the scope of the ZeroAccess botnet, I thought that some aspiring infosec data scientists might want to see how to use something besides Google Maps & Google Earth to view the data. If you look at the CSV file, it’s formatted as such (this is a small portion…the file i...

2251 sym R (1176 sym/5 pcs) 6 img 5 tbl

DIY ZeroAccess GeoIP Analysis : So What?

08.10.2012

NOTE: A great deal of this post comes from @jayjacobs as he took a conversation we were having about thoughts on ways to look at the data and just ran like the Flash with it. Did you know that – if you’re a US citizen – you have approximately a 1 in 5 chance of getting the flu this year? If you’re a male (no regional bias for this one),...

3119 sym R (2049 sym/3 pcs) 6 img 3 tbl

Get an R Data Frame from a MongoDB Query

22.10.2012

There’s a good FAQ on how to do the MongoDB query -> R data frame but I wanted to post a more complete example that included the database connection and query setup since I suspect there are folks new to Mongo who would appreciate the end-to-end view. The code is fully annotated with comments, and I’ll caveat that this was for pulling data fr...

819 sym R (1942 sym/1 pcs) 1 tbl

Watch “Sandy” In R

27.10.2012

UPDATE: Significantly updated code on githubWell, a couple folks asked how to make it more “centered” on the hurricane and stop the labels from chopping off, so I modified the previous code a bit to show how to do that. As indicated in the code comments, Google took down the cone KML files. I’ll be changing the code to use the NHC archived ...

1304 sym R (869 sym/1 pcs) 2 img 1 tbl

Watch Sandy in “R” (Including Forecast Cone)

28.10.2012

As indicated in the code comments, Google took down the cone KML files. I’ll be changing the code to use the NHC archived cone files later tonight NOTE: There is significantly updated code on github for the Sandy ‘R’ dataviz. This is a follow-up post to the quickly crafted Watch Sandy in “R” post last night. I noticed that Google provid...

884 sym R (2366 sym/1 pcs) 2 img 1 tbl

‘Sandy’ Code Up On Github

29.10.2012

UPDATE: As indicated in the code comments, Google took down the cone KML files. I’ll be changing the code to use the NHC archived cone files later tonight I will (most likely) not be littering the blog with any more updates to the ‘Sandy’ code unless they are really significant. You can follow along at home to any changes over at github. Of...

828 sym 2 img

Forbes Graph Makeover Contest Entry #1

05.12.2012

Naomi Robbins is running a graph makeover challenge over at her Forbes blog and this is my entry for the B2B/B2C Traffic Sources one (click for larger version): And, here’s the R source for how to generate it: library(ggplot2) df = read.csv("b2bb2c.csv") ggplot(data=df,aes(x=Site,y=Percentage,fill=Site)) + geom_bar(stat="identity") + f...

821 sym R (503 sym/2 pcs) 2 img 1 tbl

Slopegraphs in R

11.01.2013

I updated the code to use ggsave and tweaked some of the font & line size values for more consistent (and pretty) output. This also means that I really need to get this up on github. If you even remotely follow this blog, you’ll see that I’m kinda obsessed with slopegraphs. While I’m pretty happy with my Python implementation, I do quite a ...

1823 sym R (3696 sym/1 pcs) 2 img 1 tbl