Publications by Keith Goldfeld
Flexible simulation in simstudy with customized distribution functions
Really, the only problem with the simstudy package (š) is that there is a hard limit to the possible probability distributions that are available (the current count is 15 ā see here for a complete description). However, it turns out that there is more flexibility than first meets the eye, and we can easily accommodate a limitless number as l...
3468 sym R (934 sym/7 pcs) 8 img
To impute or not: the case of an RCT with baseline and follow-up measurements
Under normal conditions, conducting a randomized clinical trial is challenging. Throw in a pandemic and things like site selection, patient recruitment and patient follow-up can be particularly vexing. In any study, subjects need to be retained long enough so that outcomes can be measured; during a period when there are so many potential disrupti...
9442 sym R (4051 sym/7 pcs) 6 img
simstudy updated to version 0.5.0
A new version of simstudy is available on CRAN. There are two major enhancements and several new features. In the āmajorā category, I would include (1) changes to survival data generation that accommodate hazard ratios that can change over time, as well as competing risks, and (2) the addition of functions to allow users to sample from existi...
4224 sym R (4655 sym/11 pcs) 4 img 1 tbl
Adding competing risks in survival data generation
I am working on an update of simstudy that will make generating survival/time-to-event data a bit more flexible. There are two biggish enhancements. The first facilitates generation of competing events, and the second allows for the possibility of generating survival data that has time-dependent hazard ratios. This post focuses on the first enhan...
3148 sym R (2726 sym/6 pcs) 2 img
Everyone knows that loops in R are to be avoided but vectorization is not always possible
It goes without saying that there are always many ways to solve a problem in R, but clearly some ways are better (for example, faster) than others. Recently, I found myself in a situation where I could not find a way to avoid using a loop, and I was immediately concerned, knowing that I would want this code to be flexible enough to ru...
6362 sym R (7942 sym/11 pcs)
Everyone knows that loops in R are to be avoided, but vectorization is not always possible
It goes without saying that there are always many ways to solve a problem in R, but clearly some ways are better (for example, faster) than others. Recently, I found myself in a situation where I could not find a way to avoid using a loop, and I was immediately concerned, knowing that I would want this code to be flexible enough to run with a ver...
6339 sym R (7620 sym/11 pcs)
It can be easy to explore data generating mechanisms with the simstudy package
I learned statistics and probability by simulating data. Sure, I did the occasional proof, but I never believed the results until I saw it in a simulation. I guess I have it backwards, but I thatās just the way I am. And now that I am a so-called professional, I continue to use simulation to understand models, to do sample size estimates and po...
5026 sym R (7076 sym/8 pcs) 6 img 1 tbl
It can be easy to explore data generating mechanisms with the simstudy package
I learned statistics and probability by simulating data. Sure, I battled my way through proofs, but I never believed the results until I saw it in a simulation. I guess I have it backwards, it worked for me. And now that I do this for a living, I continue to use simulation to understand models, to do sample size estimates and power calculations, ...
5533 sym R (7298 sym/8 pcs) 10 img
When marginal and conditional logistic model estimates diverge
Say we have an intervention that is assigned at a group or cluster level but the outcome is measured at an individual level (e.g.Ā students in different schools, eyes on different individuals). And, say this outcome is binary; that is, something happens, or it doesnāt. (This is important, because none of this is true if the outcome is continuou...
6041 sym R (5637 sym/8 pcs) 6 img
Copulas and correlated data generation: getting beyond the normal distribution
Using the simstudy package, itās possible to generate correlated data from a normal distribution using the function genCorData. Iāve wanted to extend the functionality so that we can generate correlated data from other sorts of distributions; I thought it would be a good idea to begin with binary and Poisson distributed data, since those come...
5220 sym R (5774 sym/4 pcs) 4 img