Lab2: ANOVA in R


Getting Started with R

Click on the R icon

If no R icon, go to Start and then All Programs to find R or search.

You are now ready to continue. Let the games begin.

Examples are from Linear Models in R, by J.J. Faraway


One-Way Classfication

We will use the dataset coagulation. (It is available from the library faraway.)
Study of blood coagulation times for 24 animals randomly assigned to 4 diets, from Box, Hunter, and Hunter (1978).

We would like to know if there is a difference in treatment means.

Let's get the data using the source command:

> source("https://edoras.sdsu.edu/~babailey/stat700/coagulation.R")
Let's plot the data.

> par(mfrow=c(1,2)) 
> boxplot(coag ~ diet, coagulation, ylab="Coagulation time")
> with(coagulation, stripchart(coag ~ diet, vertical=T, method="stack", xlab="Diet", ylab="Coagulation Time"))
First, let's see what R does as a default:
> fit1 <- lm(coag ~ diet, coagulation)
> summary(fit1)
Looks like A is the reference level and has mean 61, B is 5 larger, C is 7 larger and D is 0 seconds larger.
What are the F and p-value? You can also get these with
> anova(fit1)

Let's look at some diagnostics plots of the residuals for first fit1:

> par(pty="s")
> qqnorm(residuals(fit1))
> qqline(residuals(fit1))
To see the design matrix look at:
> model.matrix(fit1)
Next, fit a model without an intercept and one with just an intercept to get:
> fitnoi <- lm(coag ~ diet -1, coagulation)
> summary(fitnoi)
> fitnull <- lm(coag ~ 1, coagulation)
> summary(fitnull)
> anova(fitnull, fitnoi)
What do you notice about the F and p-value?

You can also use contrasts,

> options(contrasts=c("contr.sum", "contr.poly"))
> fit2 <- lm(coag ~ diet, coagulation)
> summary(fit2)
Now, overall mean is 64 with estimated mean response for A 64-3=61, and 66 for B and 68 for D.


What about multiple comparisons?

There is a pairwise.t.test with p.adj="bonferroni".
You must use the attach function before you call the pairwise.t.test!
What does the attach function do?


Two-Way Classfication

We will use the dataset pvc.
From an experiment to study factors affecting particle size of PVC: 3 operators used 8 different devices called resin railcars. 2 samples were produced for each of the 24 combinations. From Morris and Watson (1998).

Let's get the PVC data using the source command:

> source("https://edoras.sdsu.edu/~babailey/stat700/pvc.R")
Let's make some plots and interaction plots.
> attach(pvc)
> stripchart(psize ~ resin, xlab="Particle size", ylab="Resin railcar")
> stripchart(psize ~ operator, xlab="Particle size", ylab="Operator")
> interaction.plot(operator, resin, psize)
> interaction.plot(resin, operator, psize)
Let's fit a full model with interations (saturated):
> fit <- lm(psize~operator+resin+operator*resin, data=pvc)
> anova(fit)
Main effects are signficant. Interaction effect is not significant.

Make diagnostics plots, including the Residuals vs Fitted Values.

Fit just a main effects model and see if there are any outliers?


What does "Make sure a factor is a factor", mean?

Let's read in the pvc dataset from a CSV file:

pvc2 <- read.csv("http://www.rohan.sdsu.edu/~babailey/stat700/pvc2.csv", header=T, row.names=1)
How does pvc2 compare to pvc? Use the str and summary commands.

You can choose to make operator and resin a factor or just include the as.factor (or factor) command in the lm function.

fit2 <- lm(psize2 ~ as.factor(operator2) + as.factor(resin2) + as.factor(operator2)*as.factor(resin2), data=pvc2)

pvc2$operator2 <- as.factor(pvc2$operator2)
pvc2$resin2 <- as.factor(pvc2$resin2) 
fi3 <- lm(psize2~operator2+resin2+operator2*resin2, data=pvc2)


Reference: Linear Models with R, by J.J. Faraway