Publications by Ken Kleinman
Example 7.36: Propensity score stratification
In examples 7.34 and 7.35 we described methods using propensity scores to account for possible confounding factors in an observational study.In addition to adjusting for the propensity score in a multiple regression and matching on the propensity score, researchers will often stratify by the propensity score, and carry out analyses wi...
798 sym 2 img
Example 7.42: Testing the proportionality assumption
In addition to the non-parametric tools discussed in recent entries, it’s common to use proportional hazards regression, (section 4.3.1) also called Cox regression, in evaluating survival data.It’s important in such models to test the proportionality assumption. Below, we demonstrate doing this for a simple model from the HELP da...
803 sym 2 img
Example 8.1: Digits of Pi
Do the digits of Pi appear in a random order? If so, the trillions of digits of Pi calculated can serve as a useful random number generator. This post was inspired by this entry on Matt Asher’s blog. Generating pseudo-random numbers is a key piece of much of modern statistical practice, whether for Markov chain Monte Carlo applic...
801 sym 2 img
Example 8.2: Digits of Pi, redux
In example 8.1, we considered some simple tests for the randomness of the digits of Pi. Here we develop a different test and implement it. If each digit appears in each place with equal and independent probability, then the places between recurrences of a digit should be Pr(gap = x) = .9^x * .1– the probability the digit recurs im...
792 sym 2 img
Summer hiatus
We’re taking a break from posting for most of August. We’ll be back in a month with new examples, including R- and SAS-applicable tricks and tools.Please drop us any ideas in the comments or by e-mail. We love feedback of any kind. Related To leave a comment for the author, please follow the link and comment on their blog: SA...
637 sym 2 img
Example 8.4: Including subsetting conditions in output
A number of analyses perform operations on subsets. Making it clear what observations have been excluded or included is helpful to include in the output.SASThe where statement (section A.6.3) is a powerful and useful tool for subsetting on the fly. (Other options for subsetting typically require data steps or data set options in the...
801 sym 2 img
Example 8.5: bubble plots part 3
An anonymous commenter expressed a desire to see how one might use SAS to draw a bubble plot with bubbles in three colors, corresponding to a fourth variable in the data set. (x, y, z for bubble size, and the category variable.) In a previous entries we discussed bubble plots and showed how to make the bubble print in two colors dep...
803 sym 2 img
Example 8.6: Changing the reference category for categorical variables
How can we change the reference category for a categorical variable? This question comes up often in a consulting practice.When including categorical covariates in regression models, there is a question of how to incorporate the categories. One simple method is to generate indicator variables, sometimes called dummy variables. We g...
801 sym 2 img
Example 8.7: Hosmer and Lemeshow goodness-of-fit
The Hosmer and Lemeshow goodness of fit (GOF) test is a way to assess whether there is evidence for lack of fit in a logistic regression model. Simply put, the test compares the expected and observed number of events in bins defined by the predicted probability of the outcome. This can be calculated in R and SAS.RIn R, we write a si...
808 sym 12 img
Example 8.8: more Hosmer and Lemeshow
This is a special R-only entry.In Example 8.7, we showed the Hosmer and Lemeshow goodness-of-fit test. Today we demonstrate more advanced computational approaches for the test.If you write a function for your own use, it hardly matters what it looks like, as long as it works. But if you want to share it, you might build in some warn...
800 sym 12 img