Programming Project 1 - Simulating Evolution

Overview

In this assignment, you will simulate evolution by drift and selection under a Wright-Fisher model. As such, these simulations examine evolution as a stochastic process. From a programming perspective, this assignment covers everything we have done so far. For loops, vectors and matrices, probability distributions, indexing, and graphing. You do not need to turn in your actual results and graphs, but you should check these to make sure your code works. Instead, submit your R code as an R script (.R file) as shown in the Week 1 Handout. To make sure your code works, use the ‘Source’ icon to make sure the entire thing runs without errors (and correctly) before submission. Your script must contain annotations (i.e. comments) to demonstrate that you know what your code is doing. The below code contains an appropriate amount of comments if you would like a reference example. You can find the rubric I will use for assessment of this assignment under the Week 3 Resources tab on the course website.

Task 1

Simulate evolution by genetic drift (only) under the Wright-Fisher model assuming a diploid population of 100 individuals (2N=200) and an initial allele frequency of p = 0.1. Track the frequency of p over 1000 generations (including the initial generation). You should store the output of allele frequency change over 1000 generations in a numeric vector.

Task 2

Conduct 50 simulations of evolution using the same conditions as described above, but store the output in a matrix with 50 rows (simulations) and 1000 columns (generations). Write code to plot these results. Specifically, create a plot with time in generations on the x-axis, and allele frequency on the y-axis. Make sure the range for the y-axis is set from 0 to 1. The results from each simulation should be shown as an individual line that depicts the allele frequency over time. There should therefore be 50 lines, covering 1000 generations each. The below code shows an example of creating a plot with multiple lines from a matrix, you may find this useful.

#First create a matrix with 5 rows and 20 columns as an example
#This is accomplished by taking a random sample from a normal distribution. 
X <- matrix(rnorm(5 * 20,mean = 0,sd = 1),nrow = 5, ncol = 20)

#Now, let's make the plot. We do row 1 first to set the y limits that 
#should cover the sampled values
plot(1:20, X[1,], type = 'l', xlab = "Iteration", ylab = "Value",ylim = c(-4,4))

#Now use a for loop to plot the rest, using the lines() function 
for(i in 2:5){
  lines(1:20, X[i, ])
}

Task 3

Repeat Task 2 in its entirety with a population of 10 individuals. \[ \]

Task 4

Again repeat Task 2, but now with selection and genetic drift. Assume selection favors the A allele, which has a frequency of p = 0.1 at the beginning. The selection coefficient is s = 0.1

\[ \] \[ \] \[ \] \[ \] \[ \]