Publications by Jacob Simmering

How slow is R really?

28.01.2013

One thing you always hear about R is how slow it is, especially when the code is not well vectorized or includes loops. But R is an interpreted language and its strong suit really isn’t speed but rather the comparative advantage is the 4,284 packages on CRAN. We accept the slower speed for the time saved from not having to re-invent the wheel e...

2971 sym R (218 sym/1 pcs)

Maximize Your Expectations!

30.01.2013

A Problem A major problem in secondary data analysis is that you didn't get to decide what data was collected. Lets say you were interested in how many times a student has read the Twilight books). Specifically, you want to know how effective the ads for the movies and books are. You come up with a model that says \( n_{T} = f(n_{VB}, \text{ads},...

9242 sym R (7433 sym/13 pcs) 10 img

Taking Expectations to the Next Level

31.01.2013

Higher Expectations I came across this post on Thursday and found it to be quite interesting. Clearly rental prices vary according to where you live. That isn't too surprising. I started thinking a bit more about it and thought that Boston and the nearby communities would have to have some differences — the area near campus would probably be hi...

4181 sym R (7684 sym/9 pcs) 6 img

Fixing My Internet With R and Python

20.02.2013

Last summer, I had some internet connectivity problems. Specifically, I would have massive latency issues that affected my conversations on Skype and my relatively pathetic under the best of circumstances efforts at online gaming. It was driving me up a wall and I couldn't figure it out. It hadn't occurred earlier with the same ISP so I thought i...

6246 sym R (1251 sym/4 pcs) 6 img

TV Ratings Myths

28.08.2013

TV Show Cancellations: Myths and Models TV shows are amazing ways to waste time and, on occasion, the story is so good that you actually start to care. The problem is that some shows get cancelled before they jump the shark. Classic examples are shows like Firefly or Arrested Development. With the increasing serialization of TV shows, having ...

6841 sym R (4839 sym/7 pcs) 10 img

Penalizing P Values

19.11.2013

Penalizing P Values Ioannidis' paper suggesting that most published results in medical research are not true is now high profile enough that even my dad, an artist who wouldn't know a test statistic if it hit him in the face, knows about it. It has even shown up recently in the Economist as a cover article and plays directly into the “decli...

5359 sym

Instrumental Variables Simulation

09.01.2014

Instrumental Variables Instrumental variables are an incredibly powerful for dealing with unobserved heterogenity within the context of regression but the language used to define them is mind bending. Typically, you hear something along the lines of “an instrumental variable is a variable that is correlated with x but uncorrelated with the out...

6912 sym R (862 sym/7 pcs)

Bayesian Search Models

13.03.2014

Bayesian Search Theory The US had a pretty big problem on their hands in 1966. Two planes had hit each other during a in-flight refueling and crashed. Normally, this would be an unfortunate thing and terrible for the families of those involved in the crash but otherwise fairly limited in importance. However, in this case, the plane being refuel...

7581 sym R (2580 sym/7 pcs) 14 img

Stop using bivariate correlations for variable selection

19.03.2014

Stop using bivariate correlations for variable selection Something I've never understood is the widespread calculation and reporting of univariate and bivariate statistics in applied work, especially when it comes to model selection. Bivariate statistics are, at best, useless for multi-variate model selection and, at worst, harmful. Since nearly...

5223 sym R (3041 sym/7 pcs) 2 img

Frequentist German Tank Problem

20.03.2014

The German Tank Problem: The Frequentist Way Many things are given a serial number and often that serial number, logically, starts at 1 and for each new unit is increased by 1. For example, German tanks in World War II had several parts with serial numbers. By collecting the value of these numbers, Allied statisticians could produce estimates of...

8670 sym R (1863 sym/5 pcs) 8 img