Publications by Keith Goldfeld
The power of stepped-wedge designs
Just before heading out on vacation last month, I put up a post that purported to compare stepped-wedge study designs with more traditional cluster randomized trials. Either because I rushed or was just lazy, I didn’t exactly do what I set out to do. I did confirm that a multi-site randomized clinical trial can be more efficient than a cluster ...
8897 sym R (5624 sym/9 pcs) 22 img
Binary, beta, beta-binomial
I’ve been working on updates for the simstudy package. In the past few weeks, a couple of folks independently reached out to me about generating correlated binary data. One user was not impressed by the copula algorithm that is already implemented. I’ve added an option to use an algorithm developed by Emrich and Piedmonte in 1991, and will be...
7568 sym R (5777 sym/9 pcs) 6 img
simstudy update: improved correlated binary outcomes
An updated version of the simstudy package (0.1.10) is now available on CRAN. The impetus for this release was a series of requests about generating correlated binary outcomes. In the last post, I described a beta-binomial data generating process that uses the recently added beta distribution. In addition to that update, I’ve added functionalit...
4143 sym R (6177 sym/11 pcs) 2 img
In regression, we assume noise is independent of all measured predictors. What happens if it isn’t?
A number of key assumptions underlie the linear regression model – among them linearity and normally distributed noise (error) terms with constant variance In this post, I consider an additional assumption: the unobserved noise is uncorrelated with any covariates or predictors in the model. In this simple model: \[Y_i = \beta_0 + \beta_1X_i + e...
5653 sym R (2604 sym/7 pcs) 10 img
Cross-over study design with a major constraint
Every new study presents its own challenges. (I would have to say that one of the great things about being a biostatistician is the immense variety of research questions that I get to wrestle with.) Recently, I was approached by a group of researchers who wanted to evaluate an intervention. Actually, they had two, but the second one was a minor t...
6216 sym R (6291 sym/11 pcs) 8 img
Causal mediation estimation measures the unobservable
I put together a series of demos for a group of epidemiology students who are studying causal mediation analysis. Since mediation analysis is not always so clear or intuitive, I thought, of course, that going through some examples of simulating data for this process could clarify things a bit. Quite often we are interested in understanding the re...
9574 sym R (4582 sym/15 pcs) 6 img 1 tbl
Generating data to explore the myriad causal effects that can be estimated in observational data analysis
I’ve been inspired by two recent talks describing the challenges of using instrumental variable (IV) methods. IV methods are used to estimate the causal effects of an exposure or intervention when there is unmeasured confounding. This estimated causal effect is very specific: the complier average causal effect (CACE). But, the CACE is just one ...
9558 sym R (3350 sym/6 pcs) 12 img
Horses for courses, or to each model its own (causal effect)
In my previous post, I described a (relatively) simple way to simulate observational data in order to compare different methods to estimate the causal effect of some exposure or treatment on an outcome. The underlying data generating process (DGP) included a possibly unmeasured confounder and an instrumental variable. (If you haven’t already, y...
8595 sym R (1954 sym/9 pcs) 4 img
Parallel processing to add a little zip to power simulations (and other replication studies)
It’s always nice to be able to speed things up a bit. My first blog post ever described an approach using Rcpp to make huge improvements in a particularly intensive computational process. Here, I want to show how simple it is to speed things up by using the R package parallel and its function mclapply. I’ve been using this function more and m...
5753 sym R (1845 sym/9 pcs) 4 img
Considering sensitivity to unmeasured confounding: part 1
Principled causal inference methods can be used to compare the effects of different exposures or treatments we have observed in non-experimental settings. These methods, which include matching (with or without propensity scores), inverse probability weighting, and various g-methods, help us create comparable groups to simulate a randomized experi...
7995 sym R (2682 sym/10 pcs) 6 img 1 tbl