Goals

After working through this handout, you should:

Reading data into R

Many formats and standards exist for storing data in text documents. We will focus on data stored in plain text documents with either white-space (i.e. space, tab, etc.) or comma-separated data values. The data may or may not have headers. Many data are intially stored in spreadsheet software, such as LibreOffice, Excel, or Google Sheets, but all of these have options fo export the data from a single spread sheet as a simple text comma-separated file (.csv) that can be read into R.

The central function for reading data in R is read.table. This function reads a file (in table format) and creates a data frame (more below). Key arguments for this function are the file name (along with the path if necessary), a boolean argument about whether the file contains a header row, and the character(s) used to separate columns (using sep, e.g. sep = ',' for csv files). Let’s try this with an example.

#Read data from a text file 
ChTrDat <- read.table("CherryTrees.txt",header = TRUE)
ChTrDat <- read.table("~/Documents/compbio/CherryTrees.txt")
#Check where I am
getwd()
## [1] "/home/briankissmer/Documents/compbio"
#Change where I am
setwd("~/Documents/compbio")

In R, it is generally good practice to set your working directory to the folder with your data, which is the same folder you might want to output results (figures etc.) to. However, in Rstudio you can also load a data file into your environment directly. This is fine if yoyu want to only use R through the Rstudio interface, or are fine with spending time clicking a bunch of buttons. Another note, the function read.csv works similar to read.table but with the default expectations of comma-separated values and a header.

Data frames

When you read data into R with read.table you end up with a data frame. This is an data object type we have used, but have not discussed much. A data frame shares some features with a list and some features with a matrix. It is a useful object type for spread-sheet like data. Similar to a matrix, a data frame has rows and columns. However, unlike a matrix (and like a list) the different columns can have different types of data. That is, one column can have numeric data and another can have text data, etc. Thus, a data frame combines the mixed data types of a list with the matrix like feature of every column having the same number of rows.

#Read data from a text file
ChTrDat <- read.table("CherryTrees.txt", header = TRUE)
is(ChTrDat)
## [1] "data.frame" "list"       "oldClass"   "vector"
#Get column names and number of columns
length(ChTrDat)
## [1] 3
#Access the Girth column
ChTrDat$Girth
##  [1]  8.3  8.6  8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0
## [16] 12.9 12.9 13.3 13.7 13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0
## [31] 20.6
#Numerical indexing also works
ChTrDat[, 1]
##  [1]  8.3  8.6  8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0
## [16] 12.9 12.9 13.3 13.7 13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0
## [31] 20.6

\[ \] \[ \] \[ \] \[ \]