Publications by Ken Kleinman
Example 8.29: Risk ratios and odds ratios
When can you safely think of an odds ratio as being similar to a risk ratio?Many people find odds ratios hard to interpret, and thus would prefer to have risk ratios. In response to this, you can find several papers that purport to convert an odds ratio (from a logistic regression) into a risk ratio. Conventional wisdom has it that “odds ratio...
2639 sym R (960 sym/2 pcs) 18 img
Example 8.30: Compare Poisson and negative binomial count models
How similar can a negative binomial distribution get to a Poisson distribution?When confronted with modeling count data, our first instinct is to use Poisson regression. But in practice, count data is often overdispersed. We can fit the overdispersion in the Poisson (Section 4.1) using quasi-likelihood methods, but a better alternative might be...
4432 sym R (1179 sym/3 pcs) 18 img
Example 8.31: Choropleth maps
In our book, we show a simple example of a map (section 6.4.2) where we read the boundary files as data sets and use SAS and R to plot them. But both SAS and R have complex functionality for using pre-compiled map data. To demonstrate them, we’ll show how to make a simple choropleth map, using US Census data available here. The file also inc...
4805 sym R (1196 sym/8 pcs) 18 img
Example 8.32: The HistData package, sunflower plots, and getting data from R into SAS
This entry is mainly a promotion of the fascinating HistData R package. The package, compiled by the psychologist, statistician, and graphics innovator Michael Friendly, contains a number of small data sets of historical interest. These include data from John Snow‘s map of cholera in London, Minard’s map of Napoleon’s Russian campaign of ...
2894 sym R (1986 sym/4 pcs) 18 img
Example 8.33: Merging data sets one-to-many
It’s often necessary to combine data from two data sets for further analysis. Such merging can be one-to-one, many-to-one, and many-to-many. The most common form is the one-to-one match, which we cover in section 1.5.7. Today we look at a one-to-many merge.Since the Major League baseball season started last Thursday, we’ll use baseball as ...
3154 sym Python (1569 sym/7 pcs) 18 img
Example 8.35: Grab true (not pseudo) random numbers; passing API URLs to functions or macros
Usually, we’re content to use a pseudo-random number generator. But sometimes we may want numbers that are actually random– an example might be for randomizing treatment status in a randomized controlled trial.The site Random.org provides truly random numbers based on radio static. For long simulations, its quota system may prev...
2799 sym Python (1047 sym/3 pcs) 14 img
Example 8.37: Read sheets from an excel file
Microsoft Excel is an awkward tool for data analysis. However, it is a reasonable environment for recording and transfering data. In our consulting practice, people frequently send us data in .xls (from Excel 97-2003) or .xlsx (from Excel 2007 or 2010) formatted files.In order to use the data in statistical software, you have to get...
2979 sym R (314 sym/2 pcs) 14 img
Example 8.39: calculating Cramer’s V
Cramer’s V is a measure of association for nominal variables. Effectively it is the Pearson chi-square statistic rescaled to have values between 0 and 1, as follows:V = sqrt(X^2 / [nobs * (min(ncols, nrows) – 1)])where X^2 is the Pearson chi-square, nobs represents the number of observations included in the table, and where ncols and nrows a...
2027 sym R (1016 sym/4 pcs) 14 img
Example 8.40: Side-by-side histograms
It’s often useful to compare histograms for some key variable, stratified by levels of some other variable. There are several ways to display something like this. The simplest may be to plot the two histograms in separate panels.SASIn SAS, the most direct and generalizable approach is through the sgpanel procedure.proc sgpanel data = 'c:\book...
836 sym R (249 sym/2 pcs) 18 img
Example 8.41: Scatterplot with marginal histograms
The scatterplot is one of the most ubiquitous, and useful graphics. It’s also very basic. One of its shortcomings is that it can hide important aspects of the marginal distributions of the two variables. To address this weakness, you can add a histogram of each margin to the plot. We demonstrate using the SF-36 MCS and PCS subscales in the ...
1670 sym R (949 sym/3 pcs) 18 img