Publications by Ken Kleinman
Example 7.25: compare draws with distribution
In example 7.24, we demonstrated a Metropolis-Hastings algorithm for generating observations from awkward distributions. In such settings it is desirable to assess the quality of draws by comparing them with the target distribution.Recall that the distribution function is f(y) = c e^(-y^4)(1+|y|)^3The constant c was not needed to gen...
803 sym 2 img
Example 7.26: probability question
Here’s a surprising problem, from the xkcd blog.Suppose I choose two (different) real numbers, by any process I choose. Then I select one at random (p= .5) to show Nick. Nick must guess whether the other is smaller or larger. Being right 50% of the time is easy. Can he do better?Of course, it wouldn’t be an interesting questio...
796 sym 2 img
Example 7.27: probability question reconsidered
In Example 7.26, we considered a problem, from the xkcd blog:Suppose I choose two (different) real numbers, by any process I choose. Then I select one at random (p= .5) to show Nick. Nick must guess whether the other is smaller or larger. Being right 50% of the time is easy. Can he do better?Randall Munroe offers a solution which ...
803 sym 2 img
Example 7.28: Bubble plots
A bubble plot is a means of displaying 3 variables in a scatterplot. The z dimension is presented in the size of the plot symbol, typically a circle. The area or radius of the circle plotted is proportional to the value of the third variable. This can be a very effective data presentation method. For example, consider Andrew Gelma...
803 sym 2 img
Example 7.29: Bubble plots colored by a fourth variable
In Example 7.28, we generated a bubble plot showing the relationship among CESD, age, and number of drinks, for women. An anonymous commenter asked whether it would be possible to color the circles according to gender. In the comments, we showed simple code for this in R and hinted at a SAS solution for two colors. Here we show in ...
800 sym 2 img
Example 7.30: Simulate censored survival data
To simulate survival data with censoring, we need to model the hazard functions for both time to event and time to censoring. We simulate both event times from a Weibull distribution with a scale parameter of 1 (this is equivalent to an exponential random variable). The event time has a Weibull shape parameter of 0.002 times a linea...
798 sym 2 img
Example 7.31: Contour plot of BMI by weight and height
A contour plot is a simple way to plot a surface in two dimensions. Lines with a constant Z value are plotted on the X-Y plane.Typical uses include weather maps displaying “isobars” (lines of constant pressure), and maps displaying lines of constant elevation useful in, e.g., hiking. Unusual examples include maps of constant tra...
799 sym 2 img
Example 7.33: Specifying fonts in graphics
For interactive data analysis, the default fonts used by SAS and R are acceptable, if not beautiful. However, for publication, it may be important to manipulate the fonts. For example, it would be desirable for the fonts in legends, axis labels, or other text printed in plots to approximate the typeface used in the rest of the text....
803 sym 2 img
Example 7.34: Propensity scores and causal inference from observational studies
Propensity scores can be used to help make causal interpretation of observational data more plausible, by adjusting for other factors that may responsible for differences between groups. Heuristically, we estimate the probability of exposure, rather than randomize exposure, as we’d ideally prefer to do. The estimated probability o...
800 sym 2 img
Example 7.35: Propensity score matching
As discussed in example 7.34, it’s sometimes preferable to match on propensity scores, rather than adjust for them as a covariate.SASWe use a suite of macros written by Jon Kosanke and Erik Bergstralh at the Mayo Clinic. The dist macro calculates the pairwise distances between observations, while the vmatch macro makes matches base...
799 sym 2 img