Publications by Keith Goldfeld
Everyone knows that loops in R are to be avoided, but vectorization is not always possible
It goes without saying that there are always many ways to solve a problem in R, but clearly some ways are better (for example, faster) than others. Recently, I found myself in a situation where I could not find a way to avoid using a loop, and I was immediately concerned, knowing that I would want this code to be flexible enough to run with a ver...
6339 sym R (7620 sym/11 pcs)
It can be easy to explore data generating mechanisms with the simstudy package
I learned statistics and probability by simulating data. Sure, I did the occasional proof, but I never believed the results until I saw it in a simulation. I guess I have it backwards, but I that’s just the way I am. And now that I am a so-called professional, I continue to use simulation to understand models, to do sample size estimates and po...
5026 sym R (7076 sym/8 pcs) 6 img 1 tbl
It can be easy to explore data generating mechanisms with the simstudy package
I learned statistics and probability by simulating data. Sure, I battled my way through proofs, but I never believed the results until I saw it in a simulation. I guess I have it backwards, it worked for me. And now that I do this for a living, I continue to use simulation to understand models, to do sample size estimates and power calculations, ...
5533 sym R (7298 sym/8 pcs) 10 img
When marginal and conditional logistic model estimates diverge
Say we have an intervention that is assigned at a group or cluster level but the outcome is measured at an individual level (e.g. students in different schools, eyes on different individuals). And, say this outcome is binary; that is, something happens, or it doesn’t. (This is important, because none of this is true if the outcome is continuou...
6041 sym R (5637 sym/8 pcs) 6 img
Copulas and correlated data generation: getting beyond the normal distribution
Using the simstudy package, it’s possible to generate correlated data from a normal distribution using the function genCorData. I’ve wanted to extend the functionality so that we can generate correlated data from other sorts of distributions; I thought it would be a good idea to begin with binary and Poisson distributed data, since those come...
5220 sym R (5774 sym/4 pcs) 4 img
Balancing on multiple factors when the sample is too small to stratify
Ideally, a study that uses randomization provides a balance of characteristics that might be associated with the outcome being studied. This way, we can be more confident that any differences in outcomes between the groups are due to the group assignments and not to differences in characteristics. Unfortunately, randomization does not guarantee b...
4956 sym R (7665 sym/10 pcs)
simstudy update: two new functions that generate correlated observations from non-normal distributions
In an earlier post, I described in a fair amount of detail an algorithm to generate correlated binary or Poisson data. I mentioned that I would be updating simstudy with functions that would make generating these kind of data relatively painless. Well, I have managed to do that, and the updated package (version 0.1.3) is available for download fr...
3569 sym R (9192 sym/13 pcs) 2 img
Using simulation for power analysis: an example based on a stepped wedge study design
Simulation can be super helpful for estimating power or sample size requirements when the study design is complex. This approach has some advantages over an analytic one (i.e. one based on a formula), particularly the flexibility it affords in setting up the specific assumptions in the planned study, such as time trends, patterns of missingness,...
8141 sym R (3899 sym/6 pcs) 4 img 1 tbl
Should we be concerned about incidence – prevalence bias?
Recently, we were planning a study to evaluate the effect of an intervention on outcomes for very sick patients who show up in the emergency department. My collaborator had concerns about a phenomenon that she had observed in other studies that might affect the results – patients measured earlier in the study tend to be sicker than those measur...
7498 sym R (8047 sym/5 pcs) 8 img
Be careful not to control for a post-exposure covariate
A researcher was presenting an analysis of the impact various types of childhood trauma might have on subsequent substance abuse in adulthood. Obviously, a very interesting and challenging research question. The statistical model included adjustments for several factors that are plausible confounders of the relationship between trauma and substan...
7777 sym R (3093 sym/10 pcs) 4 img