Introduction to R

Goals

By the end of class today, you’ll know how to:

Use R as a scientific calculator
Assign a variable and view its contents
Call functions and use arguments to modify their default options
Create, use, and analyze vector objects
Work with missing data

R as a scientific calculator

At its heart, R is a supercharged scientific calculator. It includes a comprehensive set of arithmetic operators and mathematical functions. You can get mathematical output from R simply by typing in the console.

#addition
22 + 5

## [1] 27

#multiplication
2 * 16

## [1] 32

#division
320/16.2

## [1] 19.75309

#Exponentiation
3.4^5

## [1] 454.3542

#or
3.4 ** 5

## [1] 454.3542

# standard rules of arithmetic apply
((3 + 2.7) * 8 - 1.2)^(2/3)

## [1] 12.53878

# well-known constants, pi, are defined
pi

## [1] 3.141593

Creating objects in R

<- is the variable assignment operator in R. This assigns the value on the right to objects on the left. = can also be used for assignments due to historical reasons, but not in every context. Generally, it is better practice to use <- for assignments.

x <- 5
x

## [1] 5

x <- 7 + 2
x

## [1] 9

y <- 18.1/2.4 + 6.2

z <- x + y
z

## [1] 22.74167

x <- x - 300
x

## [1] -291

What are the values in the objects after each statement below?

mass <- 40.5 # mass?
age <- 85 # age?
mass <- mass * 2 # mass?
age <- age - 20 # age?
mass_index <- mass/age # mass_index?

Functions and their arguments

Functions are used to automate more complicated sets of commands. Functions can be predefined in R, or imported within R packages (which we’ll cover later). A function usually takes one or more arguments (inputs), and often will return a value. There are functions for many common numerical transformations, one such example being sqrt(). The input (argument) must be a number, and the output is the square root of that number. Executing a function is called calling the function. As an example:

sqrt(4.5)

## [1] 2.12132

#Function outputs can be saved as objects
val <- sqrt(5.5)

#factorial function, eg. factorial(3) = 3 * 2 * 1
factorial(7)

## [1] 5040

#cosine function
cos(pi)

## [1] -1

Arguments can be anything, from numbers, to filenames, to other objects. The meaning of these arguments can be different across functions, and must be looked up in the documentation (below). If an argument is not specified, a default value given by the function may be used. These arguments are called options, and are used to alter the way a function operates. If you do not want to use the default option, you must specify your own.

#view arguments for round()
args(round)

## function (x, digits = 0, ...) 
## NULL

To view the help page for a function, you can either use ‘?’ or the help() function. Both will open the help page which gives the uses, arguments, and some examples for your function of interest. You will likely use this often during this class.

#View the full help page for round. These commands are synonymous
?round()
help(round)

As seen by args(round), round includes zero digits after the decimal point. Let’s try specifying other options.

round(pi,digits = 2)

## [1] 3.14

You’ll notice the first argument for round() is x. We didn’t specify x= in the above case, because this function only takes a single input (not counting options). Still, we can specify this if we’d like.

round(x = pi,digits = 2)

## [1] 3.14

round(pi,2)

## [1] 3.14

These arguments can go out of the order listed on the help page, so long as we specify which one is which.

round(digits = 3, x = pi)

## [1] 3.142

Round will take negative values for digits as well.

round(x = 2452346, digits = -2)

## [1] 2452300

Vectors and data types

Vectors are the most common data type in R. A vector is composed by a series of values, which can either be numbers or characters. We assign a series of values to a vector using the c() function. We can, for example, create a vector of body weights and assign it to a new object weight_g:

weight_g <- c(55.0,45.4,68.7,22,15.65)
weight_g

## [1] 55.00 45.40 68.70 22.00 15.65

#some functions take vectors as input 
sum(weight_g)

## [1] 206.75

mean(weight_g)

## [1] 41.35

median(weight_g)

## [1] 45.4

summary(weight_g)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   15.65   22.00   45.40   41.35   55.00   68.70

Vectorized equations exist in R. This means that you can modify every single element in a vector using a single equation. Let’s try it.

#A shortcut to create a vector of integers from 1 to 20
x <- 1:20
x

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

x + 3.5

##  [1]  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5
## [16] 19.5 20.5 21.5 22.5 23.5

x * 8.7

##  [1]   8.7  17.4  26.1  34.8  43.5  52.2  60.9  69.6  78.3  87.0  95.7 104.4
## [13] 113.1 121.8 130.5 139.2 147.9 156.6 165.3 174.0

2^x

##  [1]       2       4       8      16      32      64     128     256     512
## [10]    1024    2048    4096    8192   16384   32768   65536  131072  262144
## [19]  524288 1048576

Vectors can be added, subtracted, etc. to other vectors. This is done element-wise, with caution. If one vector is shorter than the other, the shorter vector will get recycled (go back to the first element after it’s run out). This migtht be what you want sometimes, but it is good to be careful.

A <- 1:10 - 1.3

B <- 11:20

AB <- A + B
AB

##  [1] 10.7 12.7 14.7 16.7 18.7 20.7 22.7 24.7 26.7 28.7

D <- c(100,15,2.3,pi,0)
D * AB

##  [1] 1070.00000  190.50000   33.81000   52.46460    0.00000 2070.00000
##  [7]  340.50000   56.81000   83.88052    0.00000

D <- 1:7
D * AB

## Warning in D * AB: longer object length is not a multiple of shorter object
## length

##  [1]  10.7  25.4  44.1  66.8  93.5 124.2 158.9  24.7  53.4  86.1

There are several other ways to create functions besides c() and :. Two common examples are the rep() and seq() functions.

help(rep)
rep(1,10)

##  [1] 1 1 1 1 1 1 1 1 1 1

rep(1:3,10)

##  [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

rep(1:3,each = 10)

##  [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3

rep(c(1.2, 7.9, 4.2), 5)

##  [1] 1.2 7.9 4.2 1.2 7.9 4.2 1.2 7.9 4.2 1.2 7.9 4.2 1.2 7.9 4.2

?seq()
seq(from = 1, to = 100, by = 2)

##  [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
## [26] 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99

seq(from = 2, by = 3, length.out = 50)

##  [1]   2   5   8  11  14  17  20  23  26  29  32  35  38  41  44  47  50  53  56
## [20]  59  62  65  68  71  74  77  80  83  86  89  92  95  98 101 104 107 110 113
## [39] 116 119 122 125 128 131 134 137 140 143 146 149

There are many functions that can be used to inspect the content of a vector. length() allows us to see how many elements are in a vector.

x <- rep(2,7)
y <- length(x)
y

## [1] 7

Objects have types, based on what is in their contents. Common types in R include:

integer for integer numbers
numeric or double for floating point numbers
logical for TRUE and FALSE (boolean)
character for text strings
vector for a vector type

#Let's see what type this vector is
x <- 1:20
is(x)

## [1] "integer"             "double"              "numeric"            
## [4] "vector"              "data.frameRowLabels"

a <- 2
is(a)

## [1] "numeric" "vector"

#To be specific:
is.numeric(x)

## [1] TRUE

#To be specific:
is.numeric(a)

## [1] TRUE

is.vector(x)

## [1] TRUE

is.vector(a)

## [1] TRUE

# other data types
str <- c("dog", "cat", "rat")
is(str)

## [1] "character"           "vector"              "data.frameRowLabels"
## [4] "SuperClassMethod"

tf <- c(TRUE,TRUE,FALSE)
is(tf)

## [1] "logical" "vector"

Mixing different types can lead to errors. For example, it is not possible to add a numeric to a vector. However, boolean logicals (TRUE and FALSE) convert to 1 and 0, allowing them to be used in mathematical statements.

3 == 2

## [1] FALSE

3 < 5

## [1] TRUE

x <- 1:5 
2 <= x

## [1] FALSE  TRUE  TRUE  TRUE  TRUE

(3 ==1) | (4 == 4)

## [1] TRUE

(3 == 1) & (4 == 4)

## [1] FALSE

5 != x

## [1]  TRUE  TRUE  TRUE  TRUE FALSE

x <- 1:3
y <- c(1,5,3)
x == y

## [1]  TRUE FALSE  TRUE

# now using logicals for arithmetic
x <- 1:8 < 2
x

## [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

x + 3

## [1] 4 3 3 3 3 3 3 3

sum(x)

## [1] 1

You can index (pull certain elements) and subset vectors using [] notation. For example:

x <- seq(from = 2, to = 100, by = 2)
x

##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
## [20]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
## [39]  78  80  82  84  86  88  90  92  94  96  98 100

x[1]

## [1] 2

x[10]

## [1] 20

x[1:10]

##  [1]  2  4  6  8 10 12 14 16 18 20

x[c(1,4,6,7)]

## [1]  2  8 12 14

#Using logic:
x[x > 14]

##  [1]  16  18  20  22  24  26  28  30  32  34  36  38  40  42  44  46  48  50  52
## [20]  54  56  58  60  62  64  66  68  70  72  74  76  78  80  82  84  86  88  90
## [39]  92  94  96  98 100

#Combining multiple logical statements
y <- x[x > 12 & x < 48]
y

##  [1] 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46

# another example
animals <- c("mouse", "rat", "dog", "cat", "cat")
# return both rat and cat
animals[animals == "cat" | animals == "rat"]

## [1] "rat" "cat" "cat"

Missing data

Datasets with missing data are quite common, therefore R has some tools to work with it. Missing data are represented in vectors as NA.

heights <- c(2,3,4,NA,6)
mean(heights)

## [1] NA

max(heights)

## [1] NA

mean(heights,na.rm=T)

## [1] 3.75

#logic test of NA
is.na(heights)

## [1] FALSE FALSE FALSE  TRUE FALSE

heights[is.na(heights) == FALSE]

## [1] 2 3 4 6

R Scripts

We have been mostly typing commands into the R console. This is fine, but you will often want to save commands in a file (or script), which you can modify and execute as much as you’d like. You can create R script files directly within R studio, or open an existing R script. You can run code by highlighting the parts you’d like to run and pressing either the Run icon or ctrl+Enter. the Source icon will run your entire script.

Some exercises to assess your understanding

For practice, complete the following tasks and save the code in an R script.

Create a vector, ’y‘, with all positive even integers less than 200.
Subset the vector to retain only those integers > 50.
Subset the vector, ’y‘, to retain only integers that when squared are > 200.
Compute the mean and standard deviation of the values you retained.
Divide all elements in ’y‘ by π. Compute the mean of the result rounded to one decimal place.