Lab1: Introduction to R (and More!)


Getting Started with R

Click on the R icon

To quit R:

> q()

What does it ask you? What does this mean?

For R HELP, for example to get help about the function rnorm ,

> help(rnorm)

You can also go to Help on the R toolbar and select R Help.


Example 1: Let the games begin.

Generate a sample of 100 N(0,1) random variables.

> help(rnorm)

> rnorm(100)

What happened?

Now, let's try:

> temp <- rnorm(100)

What is in the object temp?

What is the length of the oject temp?

Make a informative plot of temp?


Example 2: The Nonparametric Bootstrap

Let's get our own bootsrap function by:

> source("http://edoras.sdsu.edu/~babailey/bridges13/bootstrap.r")

(Here is the function: bootstrap.r )

There is a help file available: bootstrap.help

Let's boostrap the mean of data. Let's make it simple: 1,2,3

> data <- c(1,2,3)

> results <- bootstrap(x=data,nboot=100,theta=mean)

Let's make a histogram of the 100 boostrap means:

> hist(results$thetastar)

How could you construct a CI? (say, 90%)

> quantile(results$thetastar, c(0.05, 0.95))

What exactly is in the R object results, anyways? (Hint: the R names command is useful!)


Example 3: Trees

Here is: Information on the South African Heart Disease Data

Let's get the South African Heart Disease Data into R (from my website!):

> sahd <- read.table("http://edoras.sdsu.edu/~babailey/bridges13/SAheart.data", header=TRUE, row.names=1)

What exactly is in the sahd object, anyways?
(The R str and summary commands will give summaries of the dataset!)

We can make a scatterplot matrix by:

> pairs(sahd)

Before we grow a gree we have to load the R package rpart: (Go to the toolbar under Packages select Load Packages and click on rpart from the list and load.)
OR
If you are on a laptop, you can install packages with the R command install.packages("rpart")!
OR use the RStudio Install.

> library(rpart)

Let's look at the help function:

> help(rpart)

Let's grow a tree and look at the tree diagram:

> sahdtree <- rpart(as.factor(chd)~., data=sahd)
> plot(sahdtree)
> text(sahdtree)


Example 4: Random Forest

Before we grow a Random Forest we have to install the R package randomForest (see above).
We'll need to load the package:

library(randomForest)

Let's look at the help function:

> help(randomForest)

Let's grow a Random Forest:

> sahdrf <- randomForest(as.factor(chd)~., data=sahd, importance=TRUE)

If you get an error, then let's try:

> sahd$chd <- as.factor(sahd$chd)
> sahdrf <- randomForest(chd~., data=sahd, importance=TRUE)

Let's look at the output:

> print(sahdrf)

Did you grow enough trees?

> plot(sahdrf)

Let's look at the importance of the variables:

> varImpPlot(sahdrf, type=1)

The unscaled permutation importance has been shown to performed better than the scaled permutation when predictor correlation is present.

Let's now look at the unscaled importance of the variables:

> varImpPlot(sahdrf, type=1, scale=FALSE)

Let's look at the partial dependence plot for the age variable :

> partialPlot(sahdrf, pred.data=sahd, x.var=age)

Here is pplots.r to loop through all the variables.


Want more practice? Try out the other examples in the randomForest help file!