Publications by Nick Horton

Example 8.42: skewness and kurtosis and more moments (oh my!)

27.06.2011

While skewness and kurtosis are not as often calculated and reported as mean and standard deviation, they can be useful at times. Skewness is the 3rd moment around the mean, and characterizes whether the distribution is symmetric (skewness=0). Kurtosis is a function of the 4th central moment, and characterizes peakedness, where the normal distr...

1731 sym R (942 sym/4 pcs) 18 img

Example 9.1: Scatterplots with binning for large datasets

05.07.2011

Scatterplots can get very hard to interpret when displaying large datasets, as points inevitably overplot and can’t be individually discerned. A number of approaches have been crafted to help with this problem. One approach uses binning. This approach is also sometimes called a heat map, and can be though of as a two-dimensional histogram, w...

2576 sym R (490 sym/2 pcs) 18 img

Example 9.3: augmented display of contingency table

18.07.2011

SAS and R often provide different levels of details from output. This is particularly true for the descriptive analysis of contingency tables, where SAS makes it easy to display tables with additional quantities (such as the observed cell count).The mosaic package has added functionality to calculate these quantities in R. We demonstrate using ...

3097 sym R (1479 sym/3 pcs) 18 img

Example 9.9: Simplifying R using the mosaic package (part 1)

13.10.2011

While both SAS and R are powerful systems for statistical analysis, they can be frustrating to new users or those learning statistics for the first time. RThe mosaic package is designed to help simplify the interface for such new users, while allowing them to undertake sophisticated analyses. As an example of how the package simplifies life for ...

2902 sym R (2286 sym/7 pcs) 18 img

Example 9.10: more regression trees and recursive partitioning with "partykit"

17.10.2011

We discuss recursive partitioning, a technique for classification and regression using a decision tree in section 6.7.3 of the book. Support for these methods is available within the rpart package. Torsten Hothorn and Achim Zeileis have extended the support for these methods with the partykit package, which provides a toolkit with infrastructur...

1575 sym R (843 sym/3 pcs) 16 img

Example 9.12: simpler ways to carry out permutation tests

31.10.2011

In a previous entry, as well as section 2.4.3 of the book, we describe how to carry out a 2 group permutation test in SAS as well as with the coin package in R. We demonstrate with comparing the ages of the female and male subjects in the HELP study.In this entry, we revisit the permutation test using other functions.RWe describe a simpler inter...

3101 sym R (2085 sym/7 pcs) 20 img

Example 9.14: confidence intervals for logistic regression models

15.11.2011

Recently a student asked about the difference between confint() and confint.default() functions, both available in the MASS library to calculate confidence intervals from logistic regression models. The following example demonstrates that they yield different results.Rds = read.csv("http://www.math.smith.edu/r/data/help.csv") library...

1999 sym R (2653 sym/3 pcs) 16 img

Example 9.17: (much) better pairs plots

06.12.2011

Pairs plots (section 5.1.17) are a useful way of displaying the pairwise relations between variables in a dataset. But the default display is unsatisfactory when the variables aren’t all continuous. In this entry, we discuss ways to improve these displays that have been proposed by John Emerson, Walton Green, Barret Schloerke, Dianne Cook, He...

2086 sym R (480 sym/2 pcs) 18 img

Example 9.20: visualizing Simpson’s paradox

07.02.2012

Simpson’s paradox is always amazing to explain to students. What’s bad for one group, and bad for another group is good for everyone, if you just collapse over the grouping variable. Unlike many mathematical paradoxes, this arises in a number of real-world settings. In this entry, we consider visualizing Simpson’s paradox, using data fro...

3374 sym R (1743 sym/7 pcs) 20 img

managing projects using RStudio

10.02.2012

We’re continually amazed with new developments within RStudio, the integrated developed environment for R that we highlighted previously (Among others, Andrew Gelman agrees with us about its value). The most recent addition addresses one of our earlier concerns, by adding support for projects within RStudio. These allow work to be...

2032 sym 16 img