Publications by PirateGrunt

Pro Football Data

01.12.2012

I’ve made the acquaintance of a group of data analysts here in the triangle and have agreed to arrange an outing to the Durham Bulls minor league baseball team. Because it’s for stat nerds and because I was curious, I went looking for some baseball data to analyze. I found loads of it here, but soon got distracted by the presence of NFL stati...

2022 sym R (1412 sym/1 pcs) 4 img

NFL Prediction – Algorithm 1

05.12.2012

So I tidied the code up a bit from last time; no more for loop. Actually, I tidied it up a lot. My goal had been to arrange the data in such a way that I could get a simple moving average of the score difference for each team. That wound up being a semi-lengthy process. So, now, I have a function which will return the game results for a single se...

3506 sym R (1681 sym/2 pcs) 4 img

How to spend an inordinate amount of time becoming efficient

06.12.2012

I’ve spent a good deal of 2012 constructing a data warehouse to manage all the various data elements that my company has. Although we’re a small enterprise, the richness and complexity of the information is rather high. Moreover, as a data-driven organization, there’s a strong impetus to construct meaningful analysis with every bit of input...

3400 sym R (365 sym/1 pcs) 4 img

Testing Assumption Testing

13.12.2012

I’ve been doing a lot of linear modeling this year. That’s not much different than any ordinary year, but now I’m doing it in R. I had spent a bit of time in recent years trying to look at loss reserving as a multivariate regression. Excel is happy to do that, but testing various predictor variables and applying the methodology to many data...

4789 sym R (2166 sym/2 pcs) 10 img

How I learned to stop worrying and really love lists

14.12.2012

One of the first weird things to get used to in R is unlearning some of the things that you think you know. As often happens, this reminds me of a quote I once read about Zen, which went about like this (I’m paraphrasing), “When I knew nothing of Zen, mountains were mountains, rivers were rivers and the sky was the sky. When I knew a little o...

2571 sym R (150 sym/1 pcs) 4 img

Turnovers are poison

20.12.2012

This is probably a slightly useless post, but a bit of fun all the same. If nothing else, it allows me to take a stab at learning a bit more about logistic regression. I’m still trying to unravel the mystery of why the Bears lost to the Vikings two weeks ago. This mystery is compounded with attempting to understand how the Patriots lost to the ...

2821 sym R (2114 sym/1 pcs) 8 img

Querying, parsimony and golden hammers

20.12.2012

I love it when things are easy. I love it so much that I’ll spend a great deal of time and effort to keep things simple. At the same time, though, I think there’s some value in expending effort in pursuit of something. If you want to understand a thing, you have to spend time with it and accept it on its own terms. Which brings me to sqldf an...

3427 sym 4 img

Nested loops with mapply

31.12.2012

So as I sink deeper into the second level of R enlightenment, one thing troubled me. “lapply” is fine for looping over a single vector of elements, but it doesn’t do a nested loop structure. These tend to be pretty ubiquitous for me. I’m forever doing the same thing to a set of two or three different variables. “apply ” smells like a...

2793 sym R (1211 sym/5 pcs) 4 img

NFL Code on Github

02.01.2013

I’ve made some revisions and simplifications to the code to compile NFL data. It’s now all out on Github for anyone to play with in advance of the Superbowl. In the meantime, here’s a lovely picture comparing every team’s offense- as measured by total offensive yards- against their defenders. Note the anemic Chicago offense. https://gith...

765 sym R (318 sym/1 pcs) 6 img

You can’t spell loss reserving without R

02.01.2013

Last year, I spent a morning trying to return to first principles when modeling loss reserves. (Brief aside to non-actuaries: a loss reserve is the financial provision set aside to pay for claims which have either not yet settled, or have not yet been reported. If that doesn’t sound fascinating, this will likely be a fairly dull post.) I start ...

5034 sym R (3763 sym/4 pcs) 8 img