Publications by Keith Goldfeld
Balancing on multiple factors when the sample is too small to stratify
Ideally, a study that uses randomization provides a balance of characteristics that might be associated with the outcome being studied. This way, we can be more confident that any differences in outcomes between the groups are due to the group assignments and not to differences in characteristics. Unfortunately, randomization does not guarantee b...
4956 sym R (7665 sym/10 pcs)
simstudy update: two new functions that generate correlated observations from non-normal distributions
In an earlier post, I described in a fair amount of detail an algorithm to generate correlated binary or Poisson data. I mentioned that I would be updating simstudy with functions that would make generating these kind of data relatively painless. Well, I have managed to do that, and the updated package (version 0.1.3) is available for download fr...
3569 sym R (9192 sym/13 pcs) 2 img
Using simulation for power analysis: an example based on a stepped wedge study design
Simulation can be super helpful for estimating power or sample size requirements when the study design is complex. This approach has some advantages over an analytic one (i.e. one based on a formula), particularly the flexibility it affords in setting up the specific assumptions in the planned study, such as time trends, patterns of missingness,...
8141 sym R (3899 sym/6 pcs) 4 img 1 tbl
Should we be concerned about incidence – prevalence bias?
Recently, we were planning a study to evaluate the effect of an intervention on outcomes for very sick patients who show up in the emergency department. My collaborator had concerns about a phenomenon that she had observed in other studies that might affect the results – patients measured earlier in the study tend to be sicker than those measur...
7498 sym R (8047 sym/5 pcs) 8 img
Be careful not to control for a post-exposure covariate
A researcher was presenting an analysis of the impact various types of childhood trauma might have on subsequent substance abuse in adulthood. Obviously, a very interesting and challenging research question. The statistical model included adjustments for several factors that are plausible confounders of the relationship between trauma and substan...
7777 sym R (3093 sym/10 pcs) 4 img
A hidden process behind binary or other categorical outcomes?
I was thinking a lot about proportional-odds cumulative logit models last fall while designing a study to evaluate an intervention’s effect on meat consumption. After a fairly extensive pilot study, we had determined that participants can have quite a difficult time recalling precise quantities of meat consumption, so we were forced to move to ...
7590 sym R (2423 sym/9 pcs) 6 img
Further considerations of a hidden process underlying categorical responses
In my previous post, I described a continuous data generating process that can be used to generate discrete, categorical outcomes. In that post, I focused largely on binary outcomes and simple logistic regression just because things are always easier to follow when there are fewer moving parts. Here, I am going to focus on a situation where we ha...
10233 sym R (11217 sym/16 pcs) 4 img
Complier average causal effect? Exploring what we learn from an RCT with participants who don’t do what they are told.
Inspired by a free online course titled Complier Average Causal Effects (CACE) Analysis and taught by Booil Jo and Elizabeth Stuart (through Johns Hopkins University), I’ve decided to explore the topic a little bit. My goal here isn’t to explain CACE analysis in extensive detail (you should definitely go take the course for that), but to desc...
9582 sym R (3909 sym/4 pcs) 4 img
A simstudy update provides an excuse to talk a little bit about latent class regression and the EM algorithm
I was just going to make a quick announcement to let folks know that I’ve updated the simstudy package to version 0.1.4 (now available on CRAN) to include functions that allow conversion of columns to factors, creation of dummy variables, and most importantly, specification of outcomes that are more flexibly conditional on previously defined va...
8828 sym R (6940 sym/11 pcs) 8 img
CACE closed: EM opens up exclusion restriction (among other things)
This is the third, and probably last, of a series of posts touching on the estimation of complier average causal effects (CACE) and latent variable modeling techniques using an expectation-maximization (EM) algorithm . What follows is a simplistic way to implement an EM algorithm in R to do principal strata estimation of CACE. The EM algorithm I...
7692 sym R (7430 sym/5 pcs) 6 img