Click on the R icon
You are now ready to continue. Let the games begin.
Data frames are a class of objects to represent data which are usually used
in fitting models.
They share many of the properties of
matrices and of lists, used as the fundamental data structure by
most of R's modeling software.
R has a library of datasets available with which to experiment. One of these, Puromycin we will use in a later lab.
Data frames can be created by using the read.table(), data.frame(), or the as.data.frame() functions.
Suppose a table is recorded in a text file.
The read.table command creates a data frame from a file.
Here is the file puromycin.dat
> temp <- read.table("https://edoras.sdsu.edu/~babailey/stat700/puromycin.dat", header = F, col.names = c("conc", "vel", "state"))
The data frames temp and Puromycin should be identical (we'll see this in a future lab).
Note: if the file had the column headings, conc, vel, state, then you
can use the header=T option.
The option skip=0 is the number of lines in the file to
skip before reading data.
The function lm() returns an R object called a fitted linear least-squares model object, or lm object for short:
lm(formula, data, ...)
For more information see the help function:
> help(lm)
We will use the R dataset LifeCycleSavings
> help(LifeCycleSavings)
We can plot the data:
> plot(LifeCycleSavings$pop15, LifeCycleSavings$sr) > title("First LCSavings Plot")
Let's fit a simple linear regression model:
> lmfit <- lm(sr ~ pop15, data=LifeCycleSavings)
What is the class of the object lmfit?
What is in the object lmfit? (the names function is useful)
You can get the summary and diagnostic plots by using the functions:
> summary(lmfit) > par(mfrow=c(2,2)) #2x2 plotting region (there is also a layout() function) > plot(lmfit)Note: that if you do not want an intercept, then formula = sr ~ pop15 - 1
How did R calculate the standard errors for the estimates? There is a vcov.lm function
> vcov(lmfit)
> sqrt(diag(vcov(lmfit)))
From the help file of lm, see under See Also
What are the generic functions for lm and what do they do? To get an ANOVA table:
> anova(lmfit)
Let's fit the full linear model for our dataset.
> pairs(LifeCycleSavings, panel = panel.smooth, main = "LifeCycleSavings data") > fm1 <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings) > summary(fm1)
What about diagnostic plots? How well does the model fit the data?
To test whether any of the predictors have significance in the model,
i.e., whether
b_1 = b_2 =
b_3 = b_4 = 0.
We can look at the p-value 0.00079, is so small the null hypothesis is
rejected.
> help(step) > step(fm1)What is AIC?
> help(AIC)
> help(extractAIC)
There is a predict.lm function.
To test whether any of the predictors have significance in the model,
i.e., whether
b_1 = b_2 =
b_3 = b_4 = 0.
We can look at the p-value 0.00079, is so small the null hypothesis is
rejected.
or directly,
> tss <- sum((LifeCycleSavings$sr - mean(LifeCycleSavings$sr))^2) > rss <- deviance(fm1) > df.residual(fm1) > fstat <- ((tss-rss)/4)/(rss/df.residual(fm1)) > 1-pf(fstat,4,df.residual(fm1))
Test the null hypothesis that b_1 = 0, that pop15 is not significant and can be droped from the full model. (The test is given that all the other predictors are in the model.)
You could observe the t-value from the full model (p-value is 0.0026) and reject the null hypothesis. Or
A convenient way to compare two nested models is:
> m2 <- lm(sr ~ pop75 + dpi + ddpi, LifeCycleSavings) > anova(m2, fm1)
See if you can compute the p-value directly and the t-based test and p-value.
Here are some knitr resources
(Send me more and I will gladly add to this!)
Let's look at Example 2 Dr. Dribble:
From the STAT 672 Lab1
Here is a HW template that I have made for you (Thanks Nick!): STAT672HW0knitr.Rnw
Let's make a template for STAT 700 Lab1 from Section: Fiting Linear Regression Models
Try going through the following Intro. Lessons. The # sign is a comment, so you can copy and paste.
Here are some Introductory Lessons in R