Lab2: More Random Forests


Example 5: Tuning a RF

You should already have the the sahd dataframe from Lab 1, Example 3, but just in case:

Here is: Information on the South African Heart Disease Data

Let's get the South African Heart Disease Data into R:

> sahd <- read.table("http://edoras.sdsu.edu/~babailey/bridges13/SAheart.data", header=TRUE, row.names=1)

We'll need to load the package:

library(randomForest)

Let's set the random seed so that we all have the same fit!

> set.seed(6)
> sahdrf <- randomForest(as.factor(chd)~., data=sahd, importance=TRUE)

Let's use the as.factor command, so we don't have to keep using it!

> sahd$chd <- as.factor(sahd$chd)

Let's look at the output:

> print(sahdrf)

Let's tune our forest!:

> help(tuneRF)

sahdrf_tune <- tuneRF(sahd[,-10], sahd[,10])
sahdrf_tune

Default mtry was pretty good!

Let's try again, for fun!

> set.seed(12)
> sahdrf <- randomForest(as.factor(chd)~., data=sahd, importance=TRUE)
sahdrf_tune <- tuneRF(sahd[,-10], sahd[,10])


Example 6: Variable Importance Rankings and Significance

There is a new R package rfPermute that will give p-values using a permutation test, so you can check it out!
Here is a link to some info

Let's look at R code for an example and use at your own risk! ( rfpermex.r )

Seems to give reasonable results!


Example 7: Iris Data: Unsupervised RF and Clustering

We will need the cluster library for PAM cluster

> library(cluster)

We will also need function that the REU group used: pamFunction.r

Let's look at the R code in make.rf_iris_plots.r

If you would like to look at the REU Project Paper and their code for RF Figures, check out:
Bio_math_REUT_paper_2009.pdf
make.rf_plots2.r