Publications by Ken Kleinman

Example 2014.1: "Power" for a binomial probability, plus: News!

14.01.2014

Hello, folks! I’m pleased to report that Nick and I have turned in the manuscript for the second edition of SAS and R: Data Management, Statistical Analysis, and Graphics. It should be available this summer. New material includes some of our more popular blog posts, plus reproducible analysis, RStudio, and more. To celebrate, her...

5077 sym R (1375 sym/5 pcs) 14 img

Example 2014.2: Block randomization

22.01.2014

This week I had to block-randomize some units. This is ordinarily the sort of thing I would do in SAS, just because it would be faster for me. But I had already started work on the project R, using knitr/LaTeX to make a PDF, so it made sense to continue the work in R. RAs is my standard practice now in both languages, I set thing u...

3171 sym R (2432 sym/4 pcs) 14 img

Example 2014.3: Allow different variances by group

27.02.2014

One common violation of the assumptions needed for linear regression is heterscedasticity by group membership. Both SAS and R can easily accommodate this setting. Our data today comes from a real example of vitamin D supplementation of milk. Four suppliers claimed that their milk provided 100 IU of vitamin D. The null hypothesis is...

3673 sym R (2412 sym/9 pcs) 14 img

Example 2014.4: Hilbert Matrix

14.04.2014

Rick Wicklin showed how to make a Hilbert matrix in SAS/IML. Rick has a nice discussion of these matrices and why they might be interesting; the value of H_{r,c} is 1/(r+c-1). We show how to make this matrix in the data step and in R. We also show that Rick’s method for displaying fractions in SAS/IML works in PROC PRINT, and how they can be...

2734 sym R (750 sym/5 pcs) 14 img

Example 2014.5: Simple mean imputation

25.04.2014

We’re both users of multiple imputation for missing data. We believe it is the most practical principled method for incorporating the most information into data analysis. In fact, one of our more successful collaborations is a review of software for multiple imputation.But, for me at least, there are times when a simpler form of i...

3372 sym R (229 sym/2 pcs) 14 img

Example 2014.6: Comparing medians and the Wilcoxon rank-sum test

12.06.2014

A colleague recently contacted us with the following question: “My outcome is skewed– how can I compare medians across multiple categories?” What they were asking for was a generalization of the Wilcoxon rank-sum test (also known as the Mann-Whitney-Wilcoxon test, among other monikers) to more than two groups. For the record, the answer ...

4586 sym R (2246 sym/8 pcs) 16 img

Example 2014.7: Simulate logistic regression with an interaction

24.06.2014

Reader Annisa Mike asked in a comment on an early post about power calculation for logistic regression with an interaction. This is a topic that has come up with increasing frequency in grant proposals and article submissions. We’ll begin by showing how to simulate data with the interaction, and in our next post we’ll show how to assess pow...

4126 sym R (1008 sym/7 pcs) 18 img

Example 2014.8: Estimate power for an interaction, by simulation

30.06.2014

In our last entry, we demonstrated how to simulate data from a logistic regression with an interaction between a dichotomous and a continuous covariate. In this entry we show how to use the simulation to estimate the power to detect that interaction. This is a simple, elegant, and powerful idea: simply simulate data under the altern...

4478 sym R (1711 sym/9 pcs) 14 img

Example 2014.9: Rolling averages. Also: Second Edition is shipping!

11.08.2014

As of today, the second edition of “SAS and R: Data Management, Statistical Analysis, and Graphics” is shipping from CRC Press, Amazon, and other booksellers. There are lots of additional examples from this blog, new organization, and other features we hope you’ll find useful. Thanks for your support. We’ll be continuing to blog. Now...

6414 sym R (1029 sym/7 pcs) 22 img

Example 2014.10: Panel by a continuous variable

18.08.2014

In Example 8.40, side-by-side histograms, we showed how to generate histograms for some continuous variable, for each level of a categorical variable in a data set. An anonymous reader asked how we would do this if both the variables were continuous. Keep the questions coming! SASThe SAS solution we presented relied on the sgpanel procedure. ...

5243 sym R (982 sym/5 pcs) 22 img