Publications by Ken Kleinman

Example 8.29: Risk ratios and odds ratios

07.03.2011

When can you safely think of an odds ratio as being similar to a risk ratio?Many people find odds ratios hard to interpret, and thus would prefer to have risk ratios. In response to this, you can find several papers that purport to convert an odds ratio (from a logistic regression) into a risk ratio. Conventional wisdom has it that “odds ratio...

2639 sym R (960 sym/2 pcs) 18 img

Example 8.30: Compare Poisson and negative binomial count models

15.03.2011

How similar can a negative binomial distribution get to a Poisson distribution?When confronted with modeling count data, our first instinct is to use Poisson regression. But in practice, count data is often overdispersed. We can fit the overdispersion in the Poisson (Section 4.1) using quasi-likelihood methods, but a better alternative might be...

4432 sym R (1179 sym/3 pcs) 18 img

Example 8.31: Choropleth maps

22.03.2011

In our book, we show a simple example of a map (section 6.4.2) where we read the boundary files as data sets and use SAS and R to plot them. But both SAS and R have complex functionality for using pre-compiled map data. To demonstrate them, we’ll show how to make a simple choropleth map, using US Census data available here. The file also inc...

4805 sym R (1196 sym/8 pcs) 18 img

Example 8.32: The HistData package, sunflower plots, and getting data from R into SAS

29.03.2011

This entry is mainly a promotion of the fascinating HistData R package. The package, compiled by the psychologist, statistician, and graphics innovator Michael Friendly, contains a number of small data sets of historical interest. These include data from John Snow‘s map of cholera in London, Minard’s map of Napoleon’s Russian campaign of ...

2894 sym R (1986 sym/4 pcs) 18 img

Example 8.33: Merging data sets one-to-many

05.04.2011

It’s often necessary to combine data from two data sets for further analysis. Such merging can be one-to-one, many-to-one, and many-to-many. The most common form is the one-to-one match, which we cover in section 1.5.7. Today we look at a one-to-many merge.Since the Major League baseball season started last Thursday, we’ll use baseball as ...

3154 sym Python (1569 sym/7 pcs) 18 img

Example 8.35: Grab true (not pseudo) random numbers; passing API URLs to functions or macros

19.04.2011

Usually, we’re content to use a pseudo-random number generator. But sometimes we may want numbers that are actually random– an example might be for randomizing treatment status in a randomized controlled trial.The site Random.org provides truly random numbers based on radio static. For long simulations, its quota system may prev...

2799 sym Python (1047 sym/3 pcs) 14 img

Example 8.37: Read sheets from an excel file

11.05.2011

Microsoft Excel is an awkward tool for data analysis. However, it is a reasonable environment for recording and transfering data. In our consulting practice, people frequently send us data in .xls (from Excel 97-2003) or .xlsx (from Excel 2007 or 2010) formatted files.In order to use the data in statistical software, you have to get...

2979 sym R (314 sym/2 pcs) 14 img

Example 8.39: calculating Cramer’s V

03.06.2011

Cramer’s V is a measure of association for nominal variables. Effectively it is the Pearson chi-square statistic rescaled to have values between 0 and 1, as follows:V = sqrt(X^2 / [nobs * (min(ncols, nrows) – 1)])where X^2 is the Pearson chi-square, nobs represents the number of observations included in the table, and where ncols and nrows a...

2027 sym R (1016 sym/4 pcs) 14 img

Example 8.40: Side-by-side histograms

13.06.2011

It’s often useful to compare histograms for some key variable, stratified by levels of some other variable. There are several ways to display something like this. The simplest may be to plot the two histograms in separate panels.SASIn SAS, the most direct and generalizable approach is through the sgpanel procedure.proc sgpanel data = 'c:\book...

836 sym R (249 sym/2 pcs) 18 img

Example 8.41: Scatterplot with marginal histograms

20.06.2011

The scatterplot is one of the most ubiquitous, and useful graphics. It’s also very basic. One of its shortcomings is that it can hide important aspects of the marginal distributions of the two variables. To address this weakness, you can add a histogram of each margin to the plot. We demonstrate using the SF-36 MCS and PCS subscales in the ...

1670 sym R (949 sym/3 pcs) 18 img