Publications by Keith Goldfeld
The case of three MAR mechanisms: when is multiple imputation mandatory?
I thought I’d written about this before, but I searched through my posts and I couldn’t find what I was looking for. If I am repeating myself, my apologies. I explored missing data two years ago, using directed acyclic graphs (DAGs) to help understand the various missing data mechanisms (MAR, MCAR, and MNAR). The DAGs provide insight into whe...
10307 sym R (2618 sym/10 pcs) 22 img
Generating random lists of names with errors to explore fuzzy word matching
Health data systems are not always perfect. This was made painfully obvious when a study I am involved with required a matched list of nursing home residents taken from one system with set results from PCR tests for COVID-19 drawn from another. Name spellings for the same person from the second list were not always consistent across different PCR...
6553 sym R (7045 sym/14 pcs) 2 img
Sample size determination in the context of Bayesian analysis
Given my recent involvement with the design of a somewhat complex trial centered around a Bayesian data analysis, I am appreciating more and more that Bayesian approaches are a very real option for clinical trial design. A key element of any study design is sample size. While some would argue that sample size considerations are not critical to th...
7648 sym R (2013 sym/7 pcs) 8 img
Estimating a risk difference (and confidence intervals) using logistic regression
The odds ratio (OR) – the effect size parameter estimated in logistic regression – is notoriously difficult to interpret. It is a ratio of two quantities (odds, under different conditions) that are themselves ratios of probabilities. I think it is pretty clear that a very large or small OR implies a strong treatment effect, but translating th...
9026 sym R (4211 sym/11 pcs) 8 img
Fitting your model is only the beginning: Bayesian posterior probability checks with rvars
Say we’ve collected data and estimated parameters of a model that give structure to the data. An important question to ask is whether the model is a reasonable approximation of the true underlying data generating process. If we did a good job, we should be able to turn around and generate data from the model itself that looks similar to the dat...
12870 sym R (10828 sym/22 pcs) 12 img
Posterior probability checking with rvars: a quick follow-up
This is a relatively brief addendum to last week’s post, where I described how the rvar datatype implemented in the R package posterior makes it quite easy to perform posterior probability checks to assess goodness of fit. In the initial post, I generated data from a linear model and estimated parameters for a linear regression model, and, unsu...
2677 sym R (1865 sym/5 pcs) 6 img
Subgroup analysis using a Bayesian hierarchical model
I’m part of a team that recently submitted the results of a randomized clinical trial for publication in a journal. The overall findings of the study were inconclusive, and we certainly didn’t try to hide that fact in our paper. Of course, the story was a bit more complicated, as the RCT was conducted during various phases of the COVID-19 pan...
15654 sym R (6446 sym/14 pcs) 4 img
Drawing the wrong conclusion about subgroups: a comparison of Bayes and frequentist methods
In the previous post, I simulated data from a hypothetical RCT that had heterogeneous treatment effects across subgroups defined by three covariates. I presented two Bayesian models, a strongly pooled model and an unpooled version, that could be used to estimate all the subgroup effects in a single model. I compared the estimates to a set of line...
5120 sym R (2688 sym/6 pcs) 2 img
Analyzing a factorial design by focusing on the variance of effect sizes
Way back in 2018, long before the pandemic, I described a soon-to-be implemented simstudy function genMultiFac that facilitates the generation of multi-factorial study data. I followed up that post with a description of how we can use these types of efficient designs to answer multiple questions in the context of a single study. Fast forward thre...
12553 sym R (4687 sym/14 pcs) 18 img
A Bayesian analysis of a factorial design focusing on effect size estimates
Factorial study designs present a number of analytic challenges, not least of which is how to best understand whether simultaneously applying multiple interventions is beneficial. Last time I presented a possible approach that focuses on estimating the variance of effect size estimates using a Bayesian model. The scenario I used there focused on ...
9365 sym R (5102 sym/8 pcs) 6 img