Publications by Eric Cai - The Chemical Statistician

Side-by-Side Box Plots with Patterns From Data Sets Stacked by reshape2 and melt() in R

10.04.2014

Introduction A while ago, one of my co-workers asked me to group box plots by plotting them side-by-side within each group, and he wanted to use patterns rather than colours to distinguish between the box plots within a group; the publication that will display his plots prints in black-and-white only.  I gladly investigated how to do this in R, ...

6532 sym R (2220 sym/4 pcs) 24 img

The Chi-Squared Test of Independence – An Example in Both R and SAS

25.08.2014

Introduction The chi-squared test of independence is one of the most basic and common hypothesis tests in the statistical analysis of categorical data.  Given 2 categorical random variables, and , the chi-squared test of independence determines whether or not there exists a statistical dependence between them.  Formally, it is a hypothesis te...

4722 sym R (1888 sym/3 pcs) 52 img 1 tbl

Online index of plots and corresponding R scripts

29.10.2014

Dear Readers of The Chemical Statistician, While working in my job at the British Columbia Cancer Agency, I learned about a wonderful new data visualization resource from a colleague who works at the British Columbia Centre for Disease Control.  I want to share this with you, as I think that it will help you immensely in your ef...

1604 sym 2 img

Performing Logistic Regression in R and SAS

24.11.2014

Introduction My statistics education focused a lot on normal linear least-squares regression, and I was even told by a professor in an introductory statistics class that 95% of statistical consulting can be done with knowledge learned up to and including a course in linear regression.  Unfortunately, that advice has turned out to vastly underest...

3610 sym R (2906 sym/3 pcs) 16 img 12 tbl

Exploratory Data Analysis – All Blog Posts on The Chemical Statistician

11.12.2014

This series of posts introduced various methods of exploratory data analysis, providing theoretical backgrounds and practical examples.  Fully commented and readily usable R scripts are available for all topics for you to copy and paste for your own analysis!  Most of these posts involve data visualization and plotting, and I include a lot of ...

2425 sym 16 img

How to Get the Frequency Table of a Categorical Variable as a Data Frame in R

03.02.2015

Introduction One feature that I like about R is the ability to access and manipulate the outputs of many functions.  For example, you can extract the kernel density estimates from density() and scale them to ensure that the resulting density integrates to 1 over its support set. I recently needed to get a frequency table of a categorical variab...

3026 sym R (811 sym/7 pcs) 16 img

The advantages of using count() to get N-way frequency tables as data frames in R

12.02.2015

Introduction I recently introduced how to use the count() function in the “plyr” package in R to produce 1-way frequency tables in R.  Several commenters provided alternative ways of doing so, and they are all appreciated.  Today, I want to extend that tutorial by demonstrating how count() can be used to produce N-way frequency tables in th...

3091 sym R (1245 sym/7 pcs) 16 img

Resources for Learning Data Manipulation in R, SAS and Microsoft Excel

23.02.2015

I had the great pleasure of speaking to the Department of Statistics and Actuarial Science at Simon Fraser University on last Friday to share my career advice with its students and professors.  I emphasized the importance of learning skills in data manipulation during my presentation, and I want to supplement my presentation by posting some usef...

2115 sym 16 img

How to Extract a String Between 2 Characters in R and SAS

18.06.2015

Introduction I recently needed to work with date values that look like this: mydate Jan 23/2 Aug 5/20 Dec 17/2 I wanted to extract the day, and the obvious strategy is to extract the text between the space and the slash.  I needed to think about how to program this carefully in both R and SAS, because the length of the day could be 1 or 2 ...

2844 sym Python (2523 sym/5 pcs) 16 img 2 tbl

Producing a Control Chart in R – An Application in Analytical Chemistry

02.08.2015

Introduction Many processes in chemistry, especially in synthesis, require attaining a certain target value for a property of interest.  For example, when synthesizing drug capsules that contain a medicine, a chemist has to ensure that the concentration of the medicine meets a target value.  If the concentration is too high or too low, then th...

5832 sym Python (1720 sym/1 pcs) 36 img 1 tbl