You should already have the the sahd dataframe from Lab 1, Example 3, but just in case:
Here is: Information on the South African Heart Disease Data
Let's get the South African Heart Disease Data into R:
> sahd <- read.table("http://edoras.sdsu.edu/~babailey/bridges13/SAheart.data", header=TRUE, row.names=1)
We'll need to load the package:
library(randomForest)
Let's set the random seed so that we all have the same fit!
> set.seed(6)
> sahdrf <- randomForest(as.factor(chd)~., data=sahd, importance=TRUE)
Let's use the as.factor command, so we don't have to keep using it!
> sahd$chd <- as.factor(sahd$chd)
Let's look at the output:
> print(sahdrf)
Let's tune our forest!:
> help(tuneRF)
sahdrf_tune <- tuneRF(sahd[,-10], sahd[,10])
sahdrf_tune
Default mtry was pretty good!
Let's try again, for fun!
> set.seed(12)
> sahdrf <- randomForest(as.factor(chd)~., data=sahd, importance=TRUE)
sahdrf_tune <- tuneRF(sahd[,-10], sahd[,10])
There is a new R package rfPermute that will give p-values using a permutation test, so you can check it out!
Here is a link to some info
Let's look at R code for an example and use at your own risk! ( rfpermex.r )
Seems to give reasonable results!
We will need the cluster library for PAM cluster
> library(cluster)
We will also need function that the REU group used: pamFunction.r
Let's look at the R code in make.rf_iris_plots.r
If you would like to look at the REU Project Paper and their code for RF Figures, check out:
Bio_math_REUT_paper_2009.pdf
make.rf_plots2.r