Click on the R icon
If no R icon, go to Start and then All Programs to find R or search.
You are now ready to continue. Let the games begin.
Examples are from Linear Models in R, by J.J. Faraway
We will use the dataset coagulation. (It is available from the library faraway.)
Study of blood coagulation times for 24 animals randomly assigned to 4 diets,
from Box, Hunter, and Hunter (1978).
We would like to know if there is a difference in treatment means.
Let's get the data using the source command:
> source("https://edoras.sdsu.edu/~babailey/stat700/coagulation.R")Let's plot the data.
> par(mfrow=c(1,2)) > boxplot(coag ~ diet, coagulation, ylab="Coagulation time") > with(coagulation, stripchart(coag ~ diet, vertical=T, method="stack", xlab="Diet", ylab="Coagulation Time"))First, let's see what R does as a default:
> fit1 <- lm(coag ~ diet, coagulation) > summary(fit1)Looks like A is the reference level and has mean 61, B is 5 larger, C is 7 larger and D is 0 seconds larger.
> anova(fit1)
Let's look at some diagnostics plots of the residuals for first fit1:
> par(pty="s") > qqnorm(residuals(fit1)) > qqline(residuals(fit1))To see the design matrix look at:
> model.matrix(fit1)Next, fit a model without an intercept and one with just an intercept to get:
> fitnoi <- lm(coag ~ diet -1, coagulation) > summary(fitnoi) > fitnull <- lm(coag ~ 1, coagulation) > summary(fitnull) > anova(fitnull, fitnoi)What do you notice about the F and p-value?
You can also use contrasts,
> options(contrasts=c("contr.sum", "contr.poly")) > fit2 <- lm(coag ~ diet, coagulation) > summary(fit2)Now, overall mean is 64 with estimated mean response for A 64-3=61, and 66 for B and 68 for D.
There is a pairwise.t.test with p.adj="bonferroni".
You must use the attach function before you call the pairwise.t.test!
What does the attach function do?
We will use the dataset pvc.
From an experiment to study factors affecting particle size of PVC: 3 operators used 8 different devices called resin railcars. 2 samples were
produced for each of the 24 combinations. From Morris and Watson (1998).
Let's get the PVC data using the source command:
> source("https://edoras.sdsu.edu/~babailey/stat700/pvc.R")Let's make some plots and interaction plots.
> attach(pvc) > stripchart(psize ~ resin, xlab="Particle size", ylab="Resin railcar") > stripchart(psize ~ operator, xlab="Particle size", ylab="Operator") > interaction.plot(operator, resin, psize) > interaction.plot(resin, operator, psize)Let's fit a full model with interations (saturated):
> fit <- lm(psize~operator+resin+operator*resin, data=pvc) > anova(fit)Main effects are signficant. Interaction effect is not significant.
Make diagnostics plots, including the Residuals vs Fitted Values.
Fit just a main effects model and see if there are any outliers?
What does "Make sure a factor is a factor", mean?
Let's read in the pvc dataset from a CSV file:
pvc2 <- read.csv("http://www.rohan.sdsu.edu/~babailey/stat700/pvc2.csv", header=T, row.names=1)How does pvc2 compare to pvc? Use the str and summary commands.
You can choose to make operator and resin a factor or just include the as.factor (or factor) command in the lm function.
fit2 <- lm(psize2 ~ as.factor(operator2) + as.factor(resin2) + as.factor(operator2)*as.factor(resin2), data=pvc2)
pvc2$operator2 <- as.factor(pvc2$operator2) pvc2$resin2 <- as.factor(pvc2$resin2) fi3 <- lm(psize2~operator2+resin2+operator2*resin2, data=pvc2)