Click on the R icon
To quit R:
> q()
What does it ask you? What does this mean?
For R HELP, for example to get help about the function rnorm ,
> help(rnorm)
You can also go to Help on the R toolbar and select R Help.
Generate a sample of 100 N(0,1) random variables.
> help(rnorm)
> rnorm(100)
What happened?
Now, let's try:
> temp <- rnorm(100)
What is in the object temp?
What is the length of the oject temp?
Make a informative plot of temp?
Let's get our own bootsrap function by:
> source("http://edoras.sdsu.edu/~babailey/bridges13/bootstrap.r")
(Here is the function: bootstrap.r )
There is a help file available: bootstrap.help
Let's boostrap the mean of data. Let's make it simple: 1,2,3
> data <- c(1,2,3)
> results <- bootstrap(x=data,nboot=100,theta=mean)
Let's make a histogram of the 100 boostrap means:
> hist(results$thetastar)
How could you construct a CI? (say, 90%)
> quantile(results$thetastar, c(0.05, 0.95))
What exactly is in the R object results, anyways? (Hint: the R names command is useful!)
Here is: Information on the South African Heart Disease Data
Let's get the South African Heart Disease Data into R (from my website!):
> sahd <- read.table("http://edoras.sdsu.edu/~babailey/bridges13/SAheart.data", header=TRUE, row.names=1)
What exactly is in the sahd object, anyways?
(The R str and summary commands will give summaries of the dataset!)
We can make a scatterplot matrix by:
> pairs(sahd)
Before we grow a gree we have to load the R package rpart:
(Go to the toolbar under Packages select Load Packages and click on rpart from the list and load.)
OR
If you are on a laptop, you can install packages with the R command install.packages("rpart")!
OR use the RStudio Install.
> library(rpart)
Let's look at the help function:
> help(rpart)
Let's grow a tree and look at the tree diagram:
> sahdtree <- rpart(as.factor(chd)~., data=sahd)
> plot(sahdtree)
> text(sahdtree)
Before we grow a Random Forest we have to install the R package randomForest (see above).
We'll need to load the package:
library(randomForest)
Let's look at the help function:
> help(randomForest)
Let's grow a Random Forest:
> sahdrf <- randomForest(as.factor(chd)~., data=sahd, importance=TRUE)
If you get an error, then let's try:
> sahd$chd <- as.factor(sahd$chd)
> sahdrf <- randomForest(chd~., data=sahd, importance=TRUE)
Let's look at the output:
> print(sahdrf)
Did you grow enough trees?
> plot(sahdrf)
Let's look at the importance of the variables:
> varImpPlot(sahdrf, type=1)
The unscaled permutation importance has been shown to performed better than the scaled permutation when predictor correlation is present.
Let's now look at the unscaled importance of the variables:
> varImpPlot(sahdrf, type=1, scale=FALSE)
Let's look at the partial dependence plot for the age variable :
> partialPlot(sahdrf, pred.data=sahd, x.var=age)
Here is pplots.r to loop through all the variables.