Publications by Jacob Simmering
How slow is R really?
One thing you always hear about R is how slow it is, especially when the code is not well vectorized or includes loops. But R is an interpreted language and its strong suit really isn’t speed but rather the comparative advantage is the 4,284 packages on CRAN. We accept the slower speed for the time saved from not having to re-invent the wheel e...
2971 sym R (218 sym/1 pcs)
Maximize Your Expectations!
A Problem A major problem in secondary data analysis is that you didn't get to decide what data was collected. Lets say you were interested in how many times a student has read the Twilight books). Specifically, you want to know how effective the ads for the movies and books are. You come up with a model that says \( n_{T} = f(n_{VB}, \text{ads},...
9242 sym R (7433 sym/13 pcs) 10 img
Taking Expectations to the Next Level
Higher Expectations I came across this post on Thursday and found it to be quite interesting. Clearly rental prices vary according to where you live. That isn't too surprising. I started thinking a bit more about it and thought that Boston and the nearby communities would have to have some differences — the area near campus would probably be hi...
4181 sym R (7684 sym/9 pcs) 6 img
Fixing My Internet With R and Python
Last summer, I had some internet connectivity problems. Specifically, I would have massive latency issues that affected my conversations on Skype and my relatively pathetic under the best of circumstances efforts at online gaming. It was driving me up a wall and I couldn't figure it out. It hadn't occurred earlier with the same ISP so I thought i...
6246 sym R (1251 sym/4 pcs) 6 img
TV Ratings Myths
TV Show Cancellations: Myths and Models TV shows are amazing ways to waste time and, on occasion, the story is so good that you actually start to care. The problem is that some shows get cancelled before they jump the shark. Classic examples are shows like Firefly or Arrested Development. With the increasing serialization of TV shows, having ...
6841 sym R (4839 sym/7 pcs) 10 img
Penalizing P Values
Penalizing P Values Ioannidis' paper suggesting that most published results in medical research are not true is now high profile enough that even my dad, an artist who wouldn't know a test statistic if it hit him in the face, knows about it. It has even shown up recently in the Economist as a cover article and plays directly into the “decli...
5359 sym
Instrumental Variables Simulation
Instrumental Variables Instrumental variables are an incredibly powerful for dealing with unobserved heterogenity within the context of regression but the language used to define them is mind bending. Typically, you hear something along the lines of “an instrumental variable is a variable that is correlated with x but uncorrelated with the out...
6912 sym R (862 sym/7 pcs)
Bayesian Search Models
Bayesian Search Theory The US had a pretty big problem on their hands in 1966. Two planes had hit each other during a in-flight refueling and crashed. Normally, this would be an unfortunate thing and terrible for the families of those involved in the crash but otherwise fairly limited in importance. However, in this case, the plane being refuel...
7581 sym R (2580 sym/7 pcs) 14 img
Stop using bivariate correlations for variable selection
Stop using bivariate correlations for variable selection Something I've never understood is the widespread calculation and reporting of univariate and bivariate statistics in applied work, especially when it comes to model selection. Bivariate statistics are, at best, useless for multi-variate model selection and, at worst, harmful. Since nearly...
5223 sym R (3041 sym/7 pcs) 2 img
Frequentist German Tank Problem
The German Tank Problem: The Frequentist Way Many things are given a serial number and often that serial number, logically, starts at 1 and for each new unit is increased by 1. For example, German tanks in World War II had several parts with serial numbers. By collecting the value of these numbers, Allied statisticians could produce estimates of...
8670 sym R (1863 sym/5 pcs) 8 img