Publications by Nick Horton
Example 8.42: skewness and kurtosis and more moments (oh my!)
While skewness and kurtosis are not as often calculated and reported as mean and standard deviation, they can be useful at times. Skewness is the 3rd moment around the mean, and characterizes whether the distribution is symmetric (skewness=0). Kurtosis is a function of the 4th central moment, and characterizes peakedness, where the normal distr...
1731 sym R (942 sym/4 pcs) 18 img
Example 9.1: Scatterplots with binning for large datasets
Scatterplots can get very hard to interpret when displaying large datasets, as points inevitably overplot and can’t be individually discerned. A number of approaches have been crafted to help with this problem. One approach uses binning. This approach is also sometimes called a heat map, and can be though of as a two-dimensional histogram, w...
2576 sym R (490 sym/2 pcs) 18 img
Example 9.3: augmented display of contingency table
SAS and R often provide different levels of details from output. This is particularly true for the descriptive analysis of contingency tables, where SAS makes it easy to display tables with additional quantities (such as the observed cell count).The mosaic package has added functionality to calculate these quantities in R. We demonstrate using ...
3097 sym R (1479 sym/3 pcs) 18 img
Example 9.9: Simplifying R using the mosaic package (part 1)
While both SAS and R are powerful systems for statistical analysis, they can be frustrating to new users or those learning statistics for the first time. RThe mosaic package is designed to help simplify the interface for such new users, while allowing them to undertake sophisticated analyses. As an example of how the package simplifies life for ...
2902 sym R (2286 sym/7 pcs) 18 img
Example 9.10: more regression trees and recursive partitioning with "partykit"
We discuss recursive partitioning, a technique for classification and regression using a decision tree in section 6.7.3 of the book. Support for these methods is available within the rpart package. Torsten Hothorn and Achim Zeileis have extended the support for these methods with the partykit package, which provides a toolkit with infrastructur...
1575 sym R (843 sym/3 pcs) 16 img
Example 9.12: simpler ways to carry out permutation tests
In a previous entry, as well as section 2.4.3 of the book, we describe how to carry out a 2 group permutation test in SAS as well as with the coin package in R. We demonstrate with comparing the ages of the female and male subjects in the HELP study.In this entry, we revisit the permutation test using other functions.RWe describe a simpler inter...
3101 sym R (2085 sym/7 pcs) 20 img
Example 9.14: confidence intervals for logistic regression models
Recently a student asked about the difference between confint() and confint.default() functions, both available in the MASS library to calculate confidence intervals from logistic regression models. The following example demonstrates that they yield different results.Rds = read.csv("http://www.math.smith.edu/r/data/help.csv") library...
1999 sym R (2653 sym/3 pcs) 16 img
Example 9.17: (much) better pairs plots
Pairs plots (section 5.1.17) are a useful way of displaying the pairwise relations between variables in a dataset. But the default display is unsatisfactory when the variables aren’t all continuous. In this entry, we discuss ways to improve these displays that have been proposed by John Emerson, Walton Green, Barret Schloerke, Dianne Cook, He...
2086 sym R (480 sym/2 pcs) 18 img
Example 9.20: visualizing Simpson’s paradox
Simpson’s paradox is always amazing to explain to students. What’s bad for one group, and bad for another group is good for everyone, if you just collapse over the grouping variable. Unlike many mathematical paradoxes, this arises in a number of real-world settings. In this entry, we consider visualizing Simpson’s paradox, using data fro...
3374 sym R (1743 sym/7 pcs) 20 img
managing projects using RStudio
We’re continually amazed with new developments within RStudio, the integrated developed environment for R that we highlighted previously (Among others, Andrew Gelman agrees with us about its value). The most recent addition addresses one of our earlier concerns, by adding support for projects within RStudio. These allow work to be...
2032 sym 16 img