After working through this handout, you should:
Be able to get deat from a text file into R
Be comfortable with data frames
Many formats and standards exist for storing data in text documents. We will focus on data stored in plain text documents with either white-space (i.e. space, tab, etc.) or comma-separated data values. The data may or may not have headers. Many data are intially stored in spreadsheet software, such as LibreOffice, Excel, or Google Sheets, but all of these have options fo export the data from a single spread sheet as a simple text comma-separated file (.csv) that can be read into R.
The central function for reading data in R is
read.table. This function reads a file (in table format)
and creates a data frame (more below). Key arguments for this function
are the file name (along with the path if necessary), a boolean argument
about whether the file contains a header row, and the character(s) used
to separate columns (using sep, e.g. sep = ','
for csv files). Let’s try this with an example.
#Read data from a text file
ChTrDat <- read.table("CherryTrees.txt",header = TRUE)
ChTrDat <- read.table("~/Documents/compbio/CherryTrees.txt")
#Check where I am
getwd()
## [1] "/home/briankissmer/Documents/compbio"
#Change where I am
setwd("~/Documents/compbio")
In R, it is generally good practice to set your working directory to
the folder with your data, which is the same folder you might want to
output results (figures etc.) to. However, in Rstudio you
can also load a data file into your environment directly. This is fine
if yoyu want to only use R through the Rstudio
interface, or are fine with spending time clicking a bunch of buttons.
Another note, the function read.csv works similar to
read.table but with the default expectations of
comma-separated values and a header.
When you read data into R with read.table you end up
with a data frame. This is an data object type we have used, but have
not discussed much. A data frame shares some features with a list and
some features with a matrix. It is a useful object type for spread-sheet
like data. Similar to a matrix, a data frame has rows and columns.
However, unlike a matrix (and like a list) the different columns can
have different types of data. That is, one column can have numeric data
and another can have text data, etc. Thus, a data frame combines the
mixed data types of a list with the matrix like feature of every column
having the same number of rows.
#Read data from a text file
ChTrDat <- read.table("CherryTrees.txt", header = TRUE)
is(ChTrDat)
## [1] "data.frame" "list" "oldClass" "vector"
#Get column names and number of columns
length(ChTrDat)
## [1] 3
#Access the Girth column
ChTrDat$Girth
## [1] 8.3 8.6 8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0
## [16] 12.9 12.9 13.3 13.7 13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0
## [31] 20.6
#Numerical indexing also works
ChTrDat[, 1]
## [1] 8.3 8.6 8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0
## [16] 12.9 12.9 13.3 13.7 13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0
## [31] 20.6
\[ \] \[ \] \[ \] \[ \]