Publications by Ken Kleinman

Proc report for simple statistics

30.10.2011

Ken Beath, of Macquarie University, commented on an earlier entry that the best way to generate summary statistics is using proc report. While the best tools might differ, depending on the purpose, we wanted to share Ken’s code demonstrating how to replicate the R mosaic package tables using proc report.SASKen’s fully annotated c...

1449 sym R (3417 sym/3 pcs) 16 img

Example 9.13: Negative binomial regression with proc mcmc

08.11.2011

In practice, data that derive from counts rarely seem to be fit well by a Poisson model; one more flexible alternative is a negative binomial model. In this SAS-only entry, we discuss how proc mcmc can be used for estimation. An overview of support for Bayesian methods in R can be found in the Bayesian Task View. SASAs noted in example 8.30, th...

2074 sym R (1166 sym/4 pcs) 18 img

Example 9.15: Bar chart with error bars ("Dynamite plot")

22.11.2011

The “dynamite plot”, a bar chart plotting the a mean with a error bar, is one of the most reviled types of image among statisticians. Reasons to dislike them are numerous, and are nicely summarized here. (Edward Tufte also suggests they be avoided.) Nonetheless, as consulting statisticians, we’re often required to meet the needs of our co...

1940 sym R (393 sym/2 pcs) 20 img

Example 9.16: Small multiples

29.11.2011

Small multiples are one of the great ideas of graphics visionary Edward Tufte (e.g., in Envisioning Information). Briefly, the idea is that if many variations on a theme are presented, differences quickly become apparent. Today we offer general guidance on creating figures with small multiples. As an example, we’ll show graphics for the popu...

5276 sym Python (4803 sym/6 pcs) 20 img

Example 9.18: Constructing the fastest relay team via enumeration

05.01.2012

In competitive swimming, the medley relay is a team event in which four different swimmers each swim one of the four strokes: freestyle, breaststroke, backstroke, and butterfly. In general, every swimmer might be able swim any given stroke. How can we choose the fastest relay team? Here we solve this by enumerating all possible tea...

4534 sym R (3189 sym/9 pcs) 16 img

Example 9.19: Demonstrating the central limit theorem

11.01.2012

A colleague recently asked “why should the average get closer to the mean when we increase the sample size?” We should interpret this question as asking why the standard error of the mean gets smaller as n increases. The central limit theorem shows that (under certain conditions, of course) the standard error must do this, and that the mean...

3548 sym R (876 sym/4 pcs) 20 img

SAS Macro Simplifies SAS and R integration

26.01.2012

Many of us feel very enthusiastic about R. It’s free, it features cutting edge applications, it has a large community of users contributing for mutual benefit, and on and on. There are also many things to like about SAS, including stability, backwards compatibility, and professional support among them. The way to be the best analy...

3862 sym R (1581 sym/3 pcs) 16 img

RStudio in the cloud, for dummies

13.02.2012

You can have your own cloud computing version of R, complete with RStudio. Why should you? It’s cool! Plus, there’s a lot more power out there than you can easily get on your own hardware. And, it’s R in a web page. Run it from your tablet. Run it from work, even if you’re not supposed to install software. Run it from y...

8047 sym 16 img

Example 9.21: The birthday "problem" re-examined

23.02.2012

The so-called birthday paradox or birthday problem is simply the counter-intutitive discovery that the probability of (at least) two people in a group sharing a birthday goes up surprisingly fast as the group size increases. If the group is only 23 people, there is a 50% chance that two of them share a birthday, and with 40 people it’s about 90...

5061 sym R (4689 sym/7 pcs) 20 img

Example 9.23: Demonstrating proportional hazards

13.03.2012

A colleague recently asked after a slide suitable for explaining proportional hazards. In particular, she was concerned that her audience not focus on the time to event or probability of the event. An initial thought was to display the cumulative hazards, which have a constant proportion if the model is true. But the colleague’s audience mig...

4541 sym R (2375 sym/7 pcs) 24 img