Lab1: Introduction to R (and More!)


Getting Started with R

Click on the RStudio icon to get started! There are lots of tutorials available

For R HELP, for example to get help about the function rnorm ,

> help(rnorm)

RStudio has a help window!


Example 1: Let the games begin.

Generate a sample of 100 N(0,1) random variables.

> help(rnorm)

> rnorm(100)

What happened?

Now, let's try:

> temp <- rnorm(100)

What is in the object temp?

What is the length of the oject temp?

Make a informative plot of temp?


Example 2: The Nonparametric Bootstrap

Let's get our own bootsrap function by:

> source("http://edoras.sdsu.edu/~babailey/bridges13/bootstrap.r")

(Here is the function: bootstrap.r )

There is a help file available: bootstrap.help

Let's boostrap the mean of data. Let's make it simple: 1,2,3

> data <- c(1,2,3)

> results <- bootstrap(x=data,nboot=100,theta=mean)

Let's make a histogram of the 100 boostrap means:

> hist(results$thetastar)

How could you construct a CI? (say, 90%)

> quantile(results$thetastar, c(0.05, 0.95))

What exactly is in the R object results, anyways? (Hint: the R names command is useful!)


Example 3: Trees

Here is: Information on the South African Heart Disease Data

Let's get the South African Heart Disease Data into R (from my website!):

> sahd <- read.table("http://edoras.sdsu.edu/~babailey/bridges13/SAheart.data", header=TRUE, row.names=1)

What exactly is in the sahd object, anyways?
(The R dim , str and summary commands will give summaries of the dataset!)

Let's make some informative plots for exploratory data analysis (EDA).

Here are commands for some EDA plots

Before we grow a gree we have to load the R package rpart:
Go to the bottom right window and select Packages. You can scroll or type rpart in the help box. Select the rpart box from the list and load.
(If you are on a laptop, you can install packages with the R command install.packages("rpart")! )

> library(rpart)

Let's look at the help function:

> help(rpart)

Let's grow a tree and look at the tree diagram:

> sahdtree <- rpart(as.factor(chd)~., data=sahd)
> plot(sahdtree)
> text(sahdtree)


Example 4: Random Forest

Before we grow a Random Forest we have to load the R package randomForest:

See if it is already intalled on your machine by the command:

> library(randomForest)

If it is not installed, we will have to install it!

Go to the bottom right window and select Packages. Search for randomForest and intall it. (There is also an install.packages command!).

We'll need to load the package:

> library(randomForest)

Let's look at the help function:

> help(randomForest)

Let's grow a Random Forest:

> sahdrf <- randomForest(as.factor(chd)~., data=sahd, importance=TRUE)

You could also make the variable chd a factor, and then:

> sahd$chd <- as.factor(sahd$chd)
> sahdrf <- randomForest(chd~., data=sahd, importance=TRUE)

Let's look at the output:

> print(sahdrf)

Did you grow enough trees?

> plot(sahdrf)

Let's look at the importance of the variables:

> varImpPlot(sahdrf, type=1)

The unscaled permutation importance has been shown to performed better than the scaled permutation when predictor correlation is present.

Let's now look at the unscaled importance of the variables:

> varImpPlot(sahdrf, type=1, scale=FALSE)

Let's look at the partial dependence plot for the age variable :

> partialPlot(sahdrf, pred.data=sahd, x.var=age)

Here is pplots.r to loop through all the variables.


WHAT ABOUT REPRODUCIBILITY?

Need to use the set.seed function! (See the help file!)


Want more practice? Try out the other examples in the randomForest help file!

What about Regression?

Let's try the example in the help file using airquality