Publications by Ken Kleinman
Example 9.24: Changing the parameterization for categorical predictors
In our book, we discuss the important question of how to assign different parameterizations to categorical variables when fitting models (section 3.1.3). We show code in R for use in the lm() function, as follows:lm(y ~ x, contrasts=list(x,"contr.treatment")This works great in lm() and some other functions, notably glm(). But for fu...
3718 sym R (1382 sym/5 pcs) 16 img
Example 9.25: It’s been a mighty warm winter? (Plot on a circular axis)
Updated (see below)People here in the northeast US consider this to have been an unusually warm winter. Was it?The University of Dayton and the US Environmental Protection Agency maintain an archive of daily average temperatures that’s reasonably current. In the case of Albany, NY (the most similar of their records to our homes in the Massach...
5437 sym R (1549 sym/3 pcs) 18 img
Example 9.26: More circular plotting
SAS’s Rick Wicklin showed a simple loess smoother for the temperature data we showed here. Then he came back with a better approach that does away with edge effects. Rick’s smoothing was calculated and plotted on a cartesian plane. In this entry we’ll explore another option or two for smoothing, and plot the results on the same circular ...
4153 sym R (2339 sym/4 pcs) 24 img
Example 9.27: Baseball and shrinkage
To celebrate the beginning of the professional baseball season here in the US and Canada, we revisit a famous example of using baseball data to demonstrate statistical properties. In 1977, Bradley Efron and Carl Morris published a paper about the James-Stein estimator– the shrinkage estimator that has better mean squared error than the simple ...
3834 sym Python (2151 sym/3 pcs) 18 img
Example 9.31: Exploring multiple testing procedures
In example 9.30 we explored the effects of adjusting for multiple testing using the Bonferroni and Benjamini-Hochberg (or false discovery rate, FDR) procedures. At the time we claimed that it would probably be inappropriate to extract the adjusted p-values from the FDR method from their context. In this entry we attempt to explain o...
4610 sym R (1790 sym/4 pcs) 14 img
Example 9.32: Multiple testing simulation
In examples 9.30 and 9.31 we explored corrections for multiple testing and then extracting p-values adjusted by the Benjamini and Hochberg (or FDR) procedure. In this post we’ll develop a simulation to explore the impact of “strong” and “weak” control of the family-wise error rate offered in multiple comparison corrections....
4735 sym R (1874 sym/4 pcs) 14 img
Example 9.33: Multiple imputation, rounding, and bias
Nick has a paper in the American Statistician warning about bias in multiple imputation arising from rounding data imputed under a normal assumption. One example where you might run afoul of this is if the data are truly dichotomous or count variables, but you model it as normal (either because your software is unable to model dichot...
5152 sym R (2886 sym/6 pcs) 14 img
Example 9.34: Bland-Altman type plot
The Bland-Altman plot is a visual aid for assessing differences between two ways of measuring something. For example, one might compare two scales this way, or two devices for measuring particulate matter. The plot simply displays the difference between the measures against their average. Rather than a statistical test, it is intended to demons...
3974 sym Python (2416 sym/3 pcs) 18 img
Example 9.35: Discrete randomization and formatted output
A colleague asked for help with randomly choosing a kid within a family. This is for a trial in which families are recruited at well-child visits, but in each family only one of the children having a well-child visit that day can be in the study. The idea is that after recruiting the family, the research assistant needs to choose on...
3396 sym Python (7415 sym/5 pcs) 14 img
Example 9.36: Levene’s test for equal variances
The assumption of equal variances among the groups in analysis of variance is an expression of the assumption of homoscedasticity for linear models more generally. For ANOVA, this assumption can be tested via Levene’s test. The test is a function of the residuals and means within each group, though various modifications are used, ...
2234 sym R (1453 sym/3 pcs) 14 img