Goals
After working through this handout, you should:
RPlots in R One of the ways in which R
really stands out is in its ability to generate plots for visualization.
R includes many specialized functions for generating
specific types of plots, especially when considering plotting functions
added by R packages and add-on plotting frameworks like
ggplot2. However, today we will focus on the base
R plot function, plot, which offers
considerable versatility and allows us to examine many general aspects
of creating plots. This function provides generic x-y plotting
capabilities. First, let’s look at the main types of plots the
plot function generates.
# Simple scatterplot of two variables, x and y
x <- rnorm(100, mean = 0, sd = 1)
y <- rnorm(100, mean = 0, sd = 1)
# xlab and ylab specify the x and y axis labels col specifies the
# color used for the plot For a list of valid colors see
# http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
plot(x, y, xlab = "X value", ylab = "Y value", col = "darkgray", pch = 19)
# pch specifies the plot symbol, values 0:25 are valid, here is what
# they look like
plot(0:25, rep(1, 26), pch = 0:25, xlab = "pch symbol", ylab = "", bg = "red")
# Here is the first plot again with a different symbol and color
plot(x, y, xlab = "X value", ylab = "Y value", col = "cadetblue", bg = "orange",
pch = 25)
# We can also make line plots, lets generate a random walk to plot as
# a line plot, note type='l'
val <- rep(0, 100)
for (i in 2:100) {
val[i] <- val[i - 1] + rnorm(1, 0, 0.5)
}
# lwd adjusts the line width, values > 1 make it wider than default
plot(1:100, val, type = "l", xlab = "Step", ylab = "Value", col = "firebrick",
lwd = 1.5)
# Finally, we can make histogram or barplot style plot with type='h'
# In some cases hist() or barplot() might be better options
plot(1:100, rnorm(100, 0, 1), type = "h", col = "black", xlab = "Item",
ylab = "Value")
Once you have made a plot, with the
plot function, you can
add things to it. In the examples below, we will see how to add
additional points and lines, a title and legend.
# First lets generate a vector x, which you can think of as a
# covariate in some model
x <- rnorm(200, mean = 0, sd = 1)
# Now lets assume the first 100 values belong to one group, next 100
# to another
grp <- rep(c(1, 2), each = 100)
# Lets assume group has some affect on a response variable as does
# the covariate
y <- grp * 0.8 + x * 0.4 + rnorm(200, 0, 0.4)
# Now we will make a x-y scatterplot and color the points based on
# their group
plot(x, y, col = c("orange", "cadetblue")[grp], pch = 19, xlab = "X value",
ylab = "Y value")
# Lets add a legend, we first specify x y coords.
legend(-2.5, 2.8, c("group 1", "group 2"), pch = 19, col = c("orange",
"cadetblue"))
# And a title
title(main = "My plot")
# Last let's add the mean value for x and y for each group
g1mnX <- mean(x[grp == 1])
g1mnY <- mean(y[grp == 1])
g2mnX <- mean(x[grp == 2])
g2mnY <- mean(y[grp == 2])
# Note use of points function to add to plot cex adjusts size
points(g1mnX, g1mnY, pch = 21, cex = 1.8, bg = "orange")
points(g2mnX, g2mnY, pch = 21, cex = 1.8, bg = "cadetblue")
# Now let's plot 20 random walks
val <- matrix(0, nrow = 20, ncol = 300)
for (i in 1:20) {
# loop over 20 walks loop over steps
for (j in 2:300) {
val[i, j] <- val[i, j - 1] + rnorm(1, 0, 1)
}
}
# Plot first line/walk, make sure y range can accommodate all walks
lb <- min(as.vector(val))
ub <- max(as.vector(val))
# Note use of ylim to define y limits, xlim works similarly cex.lab
# adjust the axis lables
plot(1:300, val[1, ], type = "l", ylim = c(lb, ub), xlab = "Step", ylab = "Value",
cex.lab = 1.2)
for (i in 2:20) {
# lines to add lines
lines(1:300, val[i, ])
}
# We can also add a horizontal line at y=0 This is done with the
# abline function, here h = horiz, v = vert
abline(h = 0, lty = 2, col = "red", lwd = 1.5) #lty = 1 solid, 2 dash, 3 dot
Many common options for plotting are described on the help page for
par. This includes options within the plot function and
options for laying out plots. Let’s look at options for laying out plots
briefly and make a four-panel figure.
# Here we define a layout for a 2 (row) x 2 (column) plot Fancier
# layouts are possible with the layout function
par(mfrow = c(2, 2))
# Let's make the four plots, each a line plot of x vs x^n where n =
# 1,2,3,4
x <- 1:100
for (i in 1:4) {
plot(x, x^i, xlab = "X value", ylab = "Power of X", type = "l", lwd = 1.3)
title(main = paste("(", letters[i], ")"))
}
# Same as before, but force the panels to be squares and adjust
# margins
par(mfrow = c(2, 2))
par(mar = c(4, 4.5, 2, 1.5)) #lower, left, upper, right
par(pty = "s")
x <- 1:100
for (i in 1:4) {
plot(x, x^i, xlab = "X value", ylab = "Power of X", type = "l", lwd = 1.3)
title(main = paste("(", letters[i], ")"))
}
Let’s apply what you learned. First, you will need a data set. We can
load a data set that is already available in R for this.
Let’s use data from a plant growth experiment. You can access the data
set like this:
data(Loblolly)
Loblolly
## height age Seed
## 1 4.51 3 301
## 15 10.89 5 301
## 29 28.72 10 301
## 43 41.74 15 301
## 57 52.70 20 301
## 71 60.92 25 301
## 2 4.55 3 303
## 16 10.92 5 303
## 30 29.07 10 303
## 44 42.83 15 303
## 58 53.88 20 303
## 72 63.39 25 303
## 3 4.79 3 305
## 17 11.37 5 305
## 31 30.21 10 305
## 45 44.40 15 305
## 59 55.82 20 305
## 73 64.10 25 305
## 4 3.91 3 307
## 18 9.48 5 307
## 32 25.66 10 307
## 46 39.07 15 307
## 60 50.78 20 307
## 74 59.07 25 307
## 5 4.81 3 309
## 19 11.20 5 309
## 33 28.66 10 309
## 47 41.66 15 309
## 61 53.31 20 309
## 75 63.05 25 309
## 6 3.88 3 311
## 20 9.40 5 311
## 34 25.99 10 311
## 48 39.55 15 311
## 62 51.46 20 311
## 76 59.64 25 311
## 7 4.32 3 315
## 21 10.43 5 315
## 35 27.16 10 315
## 49 40.85 15 315
## 63 51.33 20 315
## 77 60.07 25 315
## 8 4.57 3 319
## 22 10.57 5 319
## 36 27.90 10 319
## 50 41.13 15 319
## 64 52.43 20 319
## 78 60.69 25 319
## 9 3.77 3 321
## 23 9.03 5 321
## 37 25.45 10 321
## 51 38.98 15 321
## 65 49.76 20 321
## 79 60.28 25 321
## 10 4.33 3 323
## 24 10.79 5 323
## 38 28.97 10 323
## 52 42.44 15 323
## 66 53.17 20 323
## 80 61.62 25 323
## 11 4.38 3 325
## 25 10.48 5 325
## 39 27.93 10 325
## 53 40.20 15 325
## 67 50.06 20 325
## 81 58.49 25 325
## 12 4.12 3 327
## 26 9.92 5 327
## 40 26.54 10 327
## 54 37.82 15 327
## 68 48.43 20 327
## 82 56.81 25 327
## 13 3.93 3 329
## 27 9.34 5 329
## 41 26.08 10 329
## 55 37.79 15 329
## 69 48.31 20 329
## 83 56.43 25 329
## 14 3.46 3 331
## 28 9.05 5 331
## 42 25.85 10 331
## 56 39.15 15 331
## 70 49.12 20 331
## 84 59.49 25 331
The Loblolly data frame has 84 rows and 3 columns of
records of the growth of Loblolly pine trees.
Here is your task. Generate a plot showing the
growth of each of trees (defined by seed) over time (defined by age). I
suggest using a line plot with one line per tree. Use a different color
for each tree and include a legend denoting what color was used for each
tree (seed number). You can use colors() to generate the
full set of R colors and then indexes to choose some
subset, or rainbow(14) to generate 14 colors sampled from
across the rainbow.