Publications by Ken Kleinman

Example 9.24: Changing the parameterization for categorical predictors

22.03.2012

In our book, we discuss the important question of how to assign different parameterizations to categorical variables when fitting models (section 3.1.3). We show code in R for use in the lm() function, as follows:lm(y ~ x, contrasts=list(x,"contr.treatment")This works great in lm() and some other functions, notably glm(). But for fu...

3718 sym R (1382 sym/5 pcs) 16 img

Example 9.25: It’s been a mighty warm winter? (Plot on a circular axis)

02.04.2012

Updated (see below)People here in the northeast US consider this to have been an unusually warm winter. Was it?The University of Dayton and the US Environmental Protection Agency maintain an archive of daily average temperatures that’s reasonably current. In the case of Albany, NY (the most similar of their records to our homes in the Massach...

5437 sym R (1549 sym/3 pcs) 18 img

Example 9.26: More circular plotting

09.04.2012

SAS’s Rick Wicklin showed a simple loess smoother for the temperature data we showed here. Then he came back with a better approach that does away with edge effects. Rick’s smoothing was calculated and plotted on a cartesian plane. In this entry we’ll explore another option or two for smoothing, and plot the results on the same circular ...

4153 sym R (2339 sym/4 pcs) 24 img

Example 9.27: Baseball and shrinkage

16.04.2012

To celebrate the beginning of the professional baseball season here in the US and Canada, we revisit a famous example of using baseball data to demonstrate statistical properties. In 1977, Bradley Efron and Carl Morris published a paper about the James-Stein estimator– the shrinkage estimator that has better mean squared error than the simple ...

3834 sym Python (2151 sym/3 pcs) 18 img

Example 9.31: Exploring multiple testing procedures

14.05.2012

In example 9.30 we explored the effects of adjusting for multiple testing using the Bonferroni and Benjamini-Hochberg (or false discovery rate, FDR) procedures. At the time we claimed that it would probably be inappropriate to extract the adjusted p-values from the FDR method from their context. In this entry we attempt to explain o...

4610 sym R (1790 sym/4 pcs) 14 img

Example 9.32: Multiple testing simulation

21.05.2012

In examples 9.30 and 9.31 we explored corrections for multiple testing and then extracting p-values adjusted by the Benjamini and Hochberg (or FDR) procedure. In this post we’ll develop a simulation to explore the impact of “strong” and “weak” control of the family-wise error rate offered in multiple comparison corrections....

4735 sym R (1874 sym/4 pcs) 14 img

Example 9.33: Multiple imputation, rounding, and bias

29.05.2012

Nick has a paper in the American Statistician warning about bias in multiple imputation arising from rounding data imputed under a normal assumption. One example where you might run afoul of this is if the data are truly dichotomous or count variables, but you model it as normal (either because your software is unable to model dichot...

5152 sym R (2886 sym/6 pcs) 14 img

Example 9.34: Bland-Altman type plot

05.06.2012

The Bland-Altman plot is a visual aid for assessing differences between two ways of measuring something. For example, one might compare two scales this way, or two devices for measuring particulate matter. The plot simply displays the difference between the measures against their average. Rather than a statistical test, it is intended to demons...

3974 sym Python (2416 sym/3 pcs) 18 img

Example 9.35: Discrete randomization and formatted output

18.06.2012

A colleague asked for help with randomly choosing a kid within a family. This is for a trial in which families are recruited at well-child visits, but in each family only one of the children having a well-child visit that day can be in the study. The idea is that after recruiting the family, the research assistant needs to choose on...

3396 sym Python (7415 sym/5 pcs) 14 img

Example 9.36: Levene’s test for equal variances

25.06.2012

The assumption of equal variances among the groups in analysis of variance is an expression of the assumption of homoscedasticity for linear models more generally. For ANOVA, this assumption can be tested via Levene’s test. The test is a function of the residuals and means within each group, though various modifications are used, ...

2234 sym R (1453 sym/3 pcs) 14 img