Publications by Ralph

Summarising data using scatter plots

18.04.2010

A scatter plot is a graph used to investigate the relationship between two variables in a data set. The x and y axes are used for the values of the two variables and a symbol on the graph represents the combination for each pair of values in the data set. This type of graph is used in many common situations and can convey a lot of useful informat...

4365 sym R (662 sym/4 pcs) 6 img

R and Tolerance Intervals

19.04.2010

Confidence intervals and prediction intervals are used by statisticians on a regular basis. Another useful interval is the tolerance interval that describes the range of values for a distribution with confidence limits calculated to a particular percentile of the distribution. The R package tolerance can be used to create a variety of tolerance i...

2079 sym R (513 sym/3 pcs)

Book Review – ggplot 2: Elegant Graphics for Data Analysis by Hadley Wickham (Springer 2009)

20.04.2010

Order this book from Amazon This book is written by the author of the ggplot2 package for R, which is a package with a design inspired by the grammar of graphics and can remove some of the effort required to put together impressive graphs. The book is just under 200 pages and covers a decent range of material to introduce new and experienced R ...

4610 sym 2 img

Simple Linear Regression

23.04.2010

One of the most frequent used techniques in statistics is linear regression where we investigate the potential relationship between a variable of interest (often called the response variable but there are many other names in use) and a set of one of more variables (known as the independent variables or some other term). Unsurprisingly there are f...

6081 sym R (1416 sym/6 pcs) 6 img

Summarising data using box and whisker plots

25.04.2010

A box and whisker plot is a type of graphical display that can be used to summarise a set of data based on the five number summary of this data. The summary statistics used to create a box and whisker plot are the median of the data, the lower and upper quartiles (25% and 75%) and the minimum and maximum values. The box and whisker plot is an eff...

4289 sym R (746 sym/4 pcs) 12 img

Analysis of Covariance – Extending Simple Linear Regression

28.04.2010

The simple linear regression model considers the relationship between two variables and in many cases more information will be available that can be used to extend the model. For example, there might be a categorical variable (sometimes known as a covariate) that can be used to divide the data set to fit a separate linear regression to each of th...

4755 sym R (3888 sym/9 pcs) 4 img

Using the update function during variable selection

09.05.2010

When fitting statistical models to data where there are multiple variables we are often interested in adding or removing terms from our model and in cases where there are a large number of terms it can be quicker to use the update function to start with a formula from a model that we have already fitted and to specify the terms that we want to ad...

2342 sym R (1610 sym/3 pcs)

Book Review – Modern Applied Statistics with S by W. N. Venables and B. D. Ripley (Springer 2003)

09.05.2010

Order this book from Amazon Modern Applied Statistics with S (Fourth Edition) is one of the oldest and most popular books on Applied Statistics using R and S-plus. A large number of topics in Applied Statistics are covered in this book and it is certainly not for the faint hearted. A sound knowledge of the Statistical Methods covered in each Ch...

9389 sym 2 img

Manual variable selection using the dropterm function

12.05.2010

When fitting a multiple linear regression model to data a natural question is whether a model can be simplified by excluding variables from the model. There are automatic procedures for undertaking these tests but some people prefer to follow a more manual approach to variable selection rather than pressing a button and taking what comes out. Fas...

2589 sym R (1628 sym/4 pcs) 2 img

Linear regression models with robust parameter estimation

15.05.2010

There are situations in regression modelling where robust methods could be considered to handle unusual observations that do not follow the general trend of the data set. There are various packages in R that provide robust statistical methods which are summarised on the CRAN Robust Task View. As an example of using robust statistical estimation i...

1881 sym R (2684 sym/5 pcs)