Programming Project 6

Overview

In this assignment, we will use AlphaFold to estimate the 3D structure of a protein of your choice and visualize this structure in R. There are many details and options that we will not have time to delve into. Rather, this assignment is meant to provide some exposure to aspects of computational biology, including approaches and applications, that we did not get to spend as much (or any) time on. You have to write very little code to complete this assignment. Instead, follow the steps below carefully and submit a file with an image of your protein’s 3D structure.

Task 1

Obtain a protein amino acid sequence (done in class Tuesday).

Choose and download the amino acid sequence (primary structure) of a protein of your choice from NCBI.

Got to NCBI’s main database page (https://www.ncbi.nlm.nih.gov/). Search the protein database for the protein of your choice. Select “Protein” under the tab near the search bar (the tab will initially say “All Databases”). Key terms to include are the name of the protein you are after (e.g., “cytochrome oxidase 1”) and the species you want (e.g., “mouse”).

You will likely obtain multiple hits. Find one that is what you were actually searching for and try to choose on labeled as “complete” or at least with a reasonable number of amino acids (>50). Click on the record. Obtain the the fasta-format amino acid sequence for the protein of your choice by clicking FASTA on the top of the window.

Upload your protein sequence here.

Step 2

Use AlphaFold to infer the 3D protein structure.

This step requires non-trivial computational resources (more than your average laptop), including access to GPUs. I will run this step with everyone’s protein sequence using the USU/UofU Center for High Performance Computing cluster (I will do this Wednesday, so you must have your amino acid sequence up by then). This is done via a command-line UNIX interface. The commands needed are specified in a shell script. This includes commands for gaining access to computational resources and for actually running the job. You do not need to understand the details of this for this course, but I will walk you through this script in class.

#!/bin/bash
#SBATCH -t 8:00:00
#SBATCH -n 16
#SBATCH -N 1
#SBATCH -p notchpeak-shared-short
#SBATCH -A notchpeak-shared-short
#SBATCH --gres=gpu:1080ti:1

cd /uufs/chpc.utah.edu/common/home/gompert-group4/projects/compbio_colabfold

module load colabfold
export FASTA_FILE=mouse_co1.fasta
export OUTPUT_DIR=results
colabfold_batch --amber --templates --num-recycle 3 --use-gpu-relax $FASTA_FILE $OUTPUT_DIR

Your .pdb file containing your folded protein can be found here. The names are loosely based on the identifier you submitted your protein under. To download it, click ‘Raw’ on the upper right-hand corner, then use Ctrl+s to save it. Note the extension, if it does not exactly end with .pdb (i.e., .pdb.txt) then it needs to be renamed to end with .pdb.

DNA_helicase.pdb
GPCR.pdb
adamalysin_II.pdb
clathrin.pdb
green_fluorescent_protein.pdb
heme_oxygenase.pdb
histone_lysine_demethylase_PHF8.pdb
lipase_mf.pdb
synthetic_construct.pdb
tRNA_synthetase.pdb
zinc_metalloproteinase_disintegrin-like_VAP1.pdb

Step 3

Visualize the 3D structure in R.

There are many neat tools for visualizing protein structures, and many have interactive components. We will do this within R using a specific package, NGLVieweR, for viewing protein structures and shiny, which will let us make a web-based visualization that is interactive. You will need to install both packages (with install.packages()) first. The complete code for the visualization is below. You only need to change “~/Downloads/CAD54434.1_cytochrome_c_oxidase_subunit_1__mitochondrion___Mus_musculus__relaxed_rank_1_model_2.pdb” to the corresponding path and file name for your pdb (protein structure) file from AlphaFold. Once you create the visualization, play with it some. You can zoom in and out and rotate the spinning image. The image should show the individual atoms comprising the protein as well as secondary structures, e.g., alpha-helixes and beta-sheets. Once you have had your fill, save and upload an image of the protein visualization. You can do this either by printing the screen to a pdf or simply taking a screenshot. Uploading your image will results in full credit for this assignment. You can also use this Rshiny web application to save an image: https://niels-van-der-velden.shinyapps.io/shinyNGLVieweR/.

library(shiny)
library(NGLVieweR)
ui <- fluidPage(NGLVieweROutput("structure"))
server <- function(input, output) {
  output$structure <- renderNGLVieweR({
    NGLVieweR("~/Downloads/CAD54434.1_cytochrome_c_oxidase_subunit_1__mitochondrion___Mus_musculus__relaxed_rank_1_model_2.pdb") %>%
      addRepresentation("cartoon", param = list(name = "cartoon", color =
                                                  "residueindex")) %>%
      addRepresentation("ball+stick",
                        param = list(
                          name = "cartoon",
                          colorScheme = "element"
                        )) %>%
      stageParameters(backgroundColor = "black") %>%
      setQuality("high") %>%
      setFocus(0) %>%
      setSpin(TRUE)
  })
}
shinyApp(ui, server)

Programming Project 6

BIOL 3070

Due Dec. 10th, 2024

Overview

Task 1

Step 2

Step 3