Goals

After working through this handout, you should:

Plots in R One of the ways in which R really stands out is in its ability to generate plots for visualization. R includes many specialized functions for generating specific types of plots, especially when considering plotting functions added by R packages and add-on plotting frameworks like ggplot2. However, today we will focus on the base R plot function, plot, which offers considerable versatility and allows us to examine many general aspects of creating plots. This function provides generic x-y plotting capabilities. First, let’s look at the main types of plots the plot function generates.

# Simple scatterplot of two variables, x and y
x <- rnorm(100, mean = 0, sd = 1)
y <- rnorm(100, mean = 0, sd = 1)
# xlab and ylab specify the x and y axis labels col specifies the
# color used for the plot For a list of valid colors see
# http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
plot(x, y, xlab = "X value", ylab = "Y value", col = "darkgray", pch = 19)

# pch specifies the plot symbol, values 0:25 are valid, here is what
# they look like
plot(0:25, rep(1, 26), pch = 0:25, xlab = "pch symbol", ylab = "", bg = "red")

# Here is the first plot again with a different symbol and color
plot(x, y, xlab = "X value", ylab = "Y value", col = "cadetblue", bg = "orange",
    pch = 25)

# We can also make line plots, lets generate a random walk to plot as
# a line plot, note type='l'
val <- rep(0, 100)
for (i in 2:100) {
    val[i] <- val[i - 1] + rnorm(1, 0, 0.5)
}
# lwd adjusts the line width, values > 1 make it wider than default
plot(1:100, val, type = "l", xlab = "Step", ylab = "Value", col = "firebrick",
    lwd = 1.5)

# Finally, we can make histogram or barplot style plot with type='h'
# In some cases hist() or barplot() might be better options
plot(1:100, rnorm(100, 0, 1), type = "h", col = "black", xlab = "Item",
    ylab = "Value")

Once you have made a plot, with the plot function, you can add things to it. In the examples below, we will see how to add additional points and lines, a title and legend.

# First lets generate a vector x, which you can think of as a
# covariate in some model
x <- rnorm(200, mean = 0, sd = 1)

# Now lets assume the first 100 values belong to one group, next 100
# to another
grp <- rep(c(1, 2), each = 100)

# Lets assume group has some affect on a response variable as does
# the covariate
y <- grp * 0.8 + x * 0.4 + rnorm(200, 0, 0.4)

# Now we will make a x-y scatterplot and color the points based on
# their group
plot(x, y, col = c("orange", "cadetblue")[grp], pch = 19, xlab = "X value",
    ylab = "Y value")

# Lets add a legend, we first specify x y coords.
legend(-2.5, 2.8, c("group 1", "group 2"), pch = 19, col = c("orange",
    "cadetblue"))
# And a title
title(main = "My plot")

# Last let's add the mean value for x and y for each group
g1mnX <- mean(x[grp == 1])
g1mnY <- mean(y[grp == 1])
g2mnX <- mean(x[grp == 2])
g2mnY <- mean(y[grp == 2])
# Note use of points function to add to plot cex adjusts size
points(g1mnX, g1mnY, pch = 21, cex = 1.8, bg = "orange")
points(g2mnX, g2mnY, pch = 21, cex = 1.8, bg = "cadetblue")

# Now let's plot 20 random walks
val <- matrix(0, nrow = 20, ncol = 300)
for (i in 1:20) {
    # loop over 20 walks loop over steps
    for (j in 2:300) {
        val[i, j] <- val[i, j - 1] + rnorm(1, 0, 1)
    }
}

# Plot first line/walk, make sure y range can accommodate all walks
lb <- min(as.vector(val))
ub <- max(as.vector(val))
# Note use of ylim to define y limits, xlim works similarly cex.lab
# adjust the axis lables
plot(1:300, val[1, ], type = "l", ylim = c(lb, ub), xlab = "Step", ylab = "Value",
    cex.lab = 1.2)
for (i in 2:20) {
    # lines to add lines
    lines(1:300, val[i, ])
}

# We can also add a horizontal line at y=0 This is done with the
# abline function, here h = horiz, v = vert
abline(h = 0, lty = 2, col = "red", lwd = 1.5)  #lty = 1 solid, 2 dash, 3 dot

Plot layouts

Many common options for plotting are described on the help page for par. This includes options within the plot function and options for laying out plots. Let’s look at options for laying out plots briefly and make a four-panel figure.

# Here we define a layout for a 2 (row) x 2 (column) plot Fancier
# layouts are possible with the layout function
par(mfrow = c(2, 2))

# Let's make the four plots, each a line plot of x vs x^n where n =
# 1,2,3,4
x <- 1:100
for (i in 1:4) {
    plot(x, x^i, xlab = "X value", ylab = "Power of X", type = "l", lwd = 1.3)
    title(main = paste("(", letters[i], ")"))
}

# Same as before, but force the panels to be squares and adjust
# margins
par(mfrow = c(2, 2))
par(mar = c(4, 4.5, 2, 1.5))  #lower, left, upper, right
par(pty = "s")
x <- 1:100
for (i in 1:4) {
    plot(x, x^i, xlab = "X value", ylab = "Power of X", type = "l", lwd = 1.3)
    title(main = paste("(", letters[i], ")"))
}

Testing your knowledge

Let’s apply what you learned. First, you will need a data set. We can load a data set that is already available in R for this. Let’s use data from a plant growth experiment. You can access the data set like this:

data(Loblolly)
Loblolly
##    height age Seed
## 1    4.51   3  301
## 15  10.89   5  301
## 29  28.72  10  301
## 43  41.74  15  301
## 57  52.70  20  301
## 71  60.92  25  301
## 2    4.55   3  303
## 16  10.92   5  303
## 30  29.07  10  303
## 44  42.83  15  303
## 58  53.88  20  303
## 72  63.39  25  303
## 3    4.79   3  305
## 17  11.37   5  305
## 31  30.21  10  305
## 45  44.40  15  305
## 59  55.82  20  305
## 73  64.10  25  305
## 4    3.91   3  307
## 18   9.48   5  307
## 32  25.66  10  307
## 46  39.07  15  307
## 60  50.78  20  307
## 74  59.07  25  307
## 5    4.81   3  309
## 19  11.20   5  309
## 33  28.66  10  309
## 47  41.66  15  309
## 61  53.31  20  309
## 75  63.05  25  309
## 6    3.88   3  311
## 20   9.40   5  311
## 34  25.99  10  311
## 48  39.55  15  311
## 62  51.46  20  311
## 76  59.64  25  311
## 7    4.32   3  315
## 21  10.43   5  315
## 35  27.16  10  315
## 49  40.85  15  315
## 63  51.33  20  315
## 77  60.07  25  315
## 8    4.57   3  319
## 22  10.57   5  319
## 36  27.90  10  319
## 50  41.13  15  319
## 64  52.43  20  319
## 78  60.69  25  319
## 9    3.77   3  321
## 23   9.03   5  321
## 37  25.45  10  321
## 51  38.98  15  321
## 65  49.76  20  321
## 79  60.28  25  321
## 10   4.33   3  323
## 24  10.79   5  323
## 38  28.97  10  323
## 52  42.44  15  323
## 66  53.17  20  323
## 80  61.62  25  323
## 11   4.38   3  325
## 25  10.48   5  325
## 39  27.93  10  325
## 53  40.20  15  325
## 67  50.06  20  325
## 81  58.49  25  325
## 12   4.12   3  327
## 26   9.92   5  327
## 40  26.54  10  327
## 54  37.82  15  327
## 68  48.43  20  327
## 82  56.81  25  327
## 13   3.93   3  329
## 27   9.34   5  329
## 41  26.08  10  329
## 55  37.79  15  329
## 69  48.31  20  329
## 83  56.43  25  329
## 14   3.46   3  331
## 28   9.05   5  331
## 42  25.85  10  331
## 56  39.15  15  331
## 70  49.12  20  331
## 84  59.49  25  331

The Loblolly data frame has 84 rows and 3 columns of records of the growth of Loblolly pine trees.

Here is your task. Generate a plot showing the growth of each of trees (defined by seed) over time (defined by age). I suggest using a line plot with one line per tree. Use a different color for each tree and include a legend denoting what color was used for each tree (seed number). You can use colors() to generate the full set of R colors and then indexes to choose some subset, or rainbow(14) to generate 14 colors sampled from across the rainbow.