Known Structure, Unknown Function: An Inquiry-based Undergraduate Biochemistry Laboratory Course

Undergraduate biochemistry laboratory courses often do not provide students with an authentic research experience, particularly when the express purpose of the laboratory is purely instructional. However, an instructional laboratory course that is inquiry- and research-based could simultaneously impart scientific knowledge and foster a student's research expertise and confidence. We have developed a year-long undergraduate biochemistry laboratory curriculum wherein students determine, via experiment and computation, the function of a protein of known three-dimensional structure. The first half of the course is inquiry-based and modular in design; students learn general biochemical techniques while gaining preparation for research experiments in the second semester. Having learned standard biochemical methods in the first semester, students independently pursue their own (original) research projects in the second semester. This new curriculum has yielded an improvement in student performance and confidence as assessed by various metrics. To disseminate teaching resources to students and instructors alike, a freely accessible Biochemistry Laboratory Education resource is available at http://biochemlab.org.


[jobDirectory]
This is how we will refer to the directory where all of your work will be done for a single job. (Think of a job as one small, selfcontained unit of work; for example, it would be one replicate, if you were pipetting many solutions to repeat a wet-lab experiment in triplicate... In computational biology, you would say you performed the calculation, or job, three times.) You will create your [jobDirectory] in the next step of this tutorial, and you will need to navigate to it on several occasions. text We will use this formatting to highlight many words throughout the tutorial. This font indicates one of two things, depending on context. First, you are looking for a button, field or file called some name. The other case is that you will be typing a text command using the keyboard. In both cases, it is text that you should find verbatim, unless...

[text]
Text with [square brackets around it] will be text that is not precisely the same for every use. This will be such things as PDB codes or ligand names, which will generally differ for each job.

GUI
This stands for Graphical User Interface, which is how you usually interact with your computer.

PGUI
This is a denotation that will be used when the command is in the small gray PyMOL box containing the File menu.

VGUI
This is a denotation that will be used when the command is in the PyMOL Viewer. Most commands given here will be on the right side panel (the graphical menu of buttons).

AGUI
This is a denotation that will be used when the command is to be issued in the AutoDock plugin. Make sure you check the tabs at the top, if you are having a hard time finding a button.

PyMOL>[cmd]
This indicates that you should type [cmd] in the PyMOL shell ( ). Feel free to use PyMOL's GUI for any commands that you feel more comfortable with, but note that it is often simpler for us to give precise instructions by using text commands, for reasons described in the footnote on page 1. (Also, as you learn PyMOL's text commands, you will become faster and more versatile in PyMOL.)

:)[cmd]
This indicates that you should type [cmd] on the Konsole command line ( ). Note that :) is not part of the actual command, but rather it denotes the shell prompt; so, do not type a :), instead just type that text following immediately to the right of the closing parenthesis.

:)cd [dir]
This command, which stands for change directory, allows you to navigate the filesystem while in a shell (in Konsole

:)cd
In this special case of cd (when no directory is specified) you will be taken back to the Home Directory.
:)ls -l This extremely usful command shows a listing of all files in the current directory (analogous to seeing a list of all files graphically, in the Windows or Mac OS).

:)pymol
This command launches PyMOL from Konsole. Note that you can also maneuver the file-system from within PyMOL, using cd, in the same fashion as from within a shell (Konsole).

<TAB>
This denotes a literal TAB on the keyboard. We use this key often because <TAB> is a powerful tool when using the shell (Konsole) and from within PyMOL's command line. <TAB> triggers the computer to try and finish that which you began typing (this is known as tab completion). This means that if you have a long command name, for example autoligand, then, more often than not, it will suffice to begin typing aut and then press the <TAB> key and let the computer finish your thought. If there are multiple (reasonably few) options for completion of a command that begins aut..., then the shell will list those potential commands, and that is handy in its own right (e.g., when you know only the begining of a command, or can't remember what some file was called. . . ).

Setting up Your Directory
In this section, you will create your [jobDirectory] and start PyMOL in Linux for the first time. For the ligand, we are using the MOL2 coordinate file downloaded from the ZINC database. If you ever need coordinates for a ligand, we advise that you start searching at the ZINC database (an online database of purchasable ligands), as your search may well end there or never be complete. Note that no pre-processing has been performed on the ligand prior to your receiving it here.

PyMOL> load [receptor].pdb
We use Protein Data Bank (PDB, .pdb) structure files for docking. The PQR format (that was also in RECEPTORS) is useful in electrostatics calculations, but that can be a topic for later analysis or discussion.

PyMOL> h add [receptor]
This command adds hydrogens based on empty valences. (As a sidenote, the PyMOL protonation tool is necessarily the 'best' algorithm, but its advantages are that it does not require other third-party programs or libraries, and it isn't as picky about ligands in the PDB file (versus other methods).
This opens the GUI plugin that we will use to set-up our docking calculations.
The GUI should open to the Configuration tab, which should already be set with appropriate parameters.

AGUI: Grid Setting
In this tab, we will set-up the three-dimensional (3D) region of space where the molecular docking will occur. This places the center of the calculation grid (last step) at the center-of-mass of your protein, which is a good starting point; we may end-up needing to adjust this in a moment (see below).
In AutoDock Vina, a 3D grid is laid over the protein, and the interaction energy of various atoms is computed at each grid point. By setting the spacing to 1.0, the other measures presented in the GUI will also be in units of Angstroms (Å), and thus more easily understood. (If this step is confusing, that is ok: the PyMOL plugin is smart enough to adjust your measurements to correct geometric amounts when it creates the configuration file.) AGUI: In Parameters, adjust X-points, Y-points and Z-points until the grid box covers the protein.

AGUI: Receptor
In this tab, we will finish preparing the receptor for docking by saving AutoDock's own special format, PDBQT, which stores some additional information (beyond the coordinates in the PDB file format). The most critical piece of additional data is the bonding information for all atoms in the system -this information defines the molecular topology and also enables us to specify which bonds we will allow to freely rotate.
Here, you are simply telling PyMOL which of the objects that it is storing is the receptor (i.e., your POI, which is to be docked to).

AGUI: Press Generate
Receptor -> The PyMOL plugin will now go find the correct preparation script and will apply it to the receptor (your POI). So, just wait for it to finish and add your receptor to the Receptors list. While this is occurring, you should look in the Log field for any errors, because if any part of this setup was wrong then this step is likely to fail (not to worry, this is probably not your fault). Unfortunately, these error messages can be subtle and, sometimes, the program will continue on computing, but will give flawed results. If an error arises here, and if you research it a little (use Google) and do not understand it, please show your TA the error message (it may be a computer/IT problem that can be readily addressed by one of us).

AGUI: Ligands
This is exactly the same as the receptor (above), except that now you are chosing your ligand . . . So, give this a shot on your own.
AGUI: Docking This is where we can print the final configuration file for AutoDock Vina. The Run Vina button seems to be broken (software is not always perfect), and so we will have to resort to the Konsole to actually run Vina.

AGUI: Press Write Vina Input File(s)
The program writes another file in your [jobDirectory], which is probably starting to look like a cluttered mess. That's OK.

Open a new Konsole and :)cd [jobDirectory].
There is a shortcut to do this, actually: In the Konsole that is running PyMOL, double-click the free area at the bottom, located beside the current tab. This opens a new tab which provides a shell that is already in the directory of the previous tab (so you don't have to navigate there again).

:)vina --config [ligand].vina config.txt
By executing this command -type it exactly as shown, and press Enter -your computer should happily begin computing docking conformations. When this finishes, we will begin the fun part, analysis of the docked structures of the ligands to your POI (each of these are known as docking poses). Wait for this job to run to completion, which will be apparent when the command prompt :) returns control to you (the user) rather than the program that just finished running.

Analyzing Docking Results: The Mechanics
In this section, we will load the docked ligand conformations (the poses) into PyMOL for further analysis. . . and that will be all that is covered in this current tutorial, because analysis of the docked poses -literally, the docking results -is your job, and is specific to your POI. (Note that by 'analysis' we mean visual analysis and interpretation of the locations of the ligands [on the POI], their detailed 3D structures, inter-atomic interactions, ligands· · ·POI contacts, etc.) 1. Open log.csv in Excel or a comparable program. If you are using a Linux workstation, we suggest LibreOffice Calc. Note that there are no headings. This is because we wish for this file to be easily loaded into any program that accepts CSV, but headings may hinder such compatibility.

Navigating in PyMOL
Object This organizational unit is how PyMOL internally stores a 3D structural entity. When a protein or any other molecule is opened in PyMOL, that auto-creates one object; the next molecule that is loaded will be a new object, and so on. These objects can be edited as one group.

Object Control Panel
This is the area on the right-hand side of the PyMOL Viewer providing a list of the objects. Many of the GUI commands will be found here, and we will assume that you can explore this area on your own.

Left-click & drag
This rotates the protein representation in 3D space. Play with this for awhile to become comfortable with how this works.

Right-click & up-down drag
This zooms in and out on the protein.

Scroll wheel
This changes the clip, which is the width of a slab that dictates how much depth of the 3D space (the z -direction) is rendered at once. Most of the time, it's not a bad idea to begin by increasing the clip until the entire protein can be seen (see also PyMOL's closely related 'zoom' and 'center' commands). Another way to acheive this is to type zoom in the PyMOL PGUI.

PyMOL>load [file]
This loads a structure file (e.g., in PDB format) into PyMOL, and thereby instantiates a new object corresponding to this structure.

PyMOL>save [file], [selection]
This is the command for all of PyMOL save functionality, so it is a bit intricate. First, you specify the [file], which is what the file will be called. This needs the extension because that is how PyMOL determines in what file format to save. The two important types are .pse, which is a PyMOL session file allowing you to save your work, and .pdb, which simply specifies a 3D structure in PDB format.

fetch [pdbCode]
This automatically retrieves the PDB entry from the PDB database, without your having to explicitly download it first (in fact, on Linux the PDB file will be downloaded to the local directory from which PyMOL was launched).

PyMOL>orient [object]
This resets the view to see the [object].

PyMOL>delete [object]
This removes the object from PyMOL.

Selections in PyMOL
Atom selections are a vital part of being able to manipulate molecules and subsets of molecules in PyMOL (or any other molecular visualization software environment). For high-quality molecular graphics, you will have to become quite familiar with named atom selections. Selections can be thought of as a type of object, but can contain any logical set of atoms, which can then be manipulated together as a unit (by 'logical' we mean in a Boolean sense). You can make selections with text commands or by clicking on the protein. The click method has seven modes for different selection scopes: atoms, residues, chains, segments, objects, molecules and C-alphas. To change the mode, PGUI: Mouse → Selection Mode.

PyMOL>select [selectionName], [descriptors]
This is a command that makes a selection in PyMOL using logical descriptions. The [selectionName] is what the selection will be called in the Object Control Panel, and [descriptors] is the logic statement for whether or not an atom belongs in the selection. How to form the logic statements will be the rest of the topic of this section. If the [selectionName] is omitted, then the name will default to simply 'sele'. [object] When an object is included as part of the descriptor, then an atom must be part of that object in order to be chosen. So, if you would like to select all atoms in your receptor, the simple command would be PyMOL>select sele, [receptor]. This isn't useful in and of itself, but will often be used in logic statements (when multiple objects are loaded, e.g., your POI and a homolog to be used for structural alignment in PyMOL).
or This boolean logical operator combines two descriptors by selecting only those atomic entities that satisfy at least one of the descriptors (i.e., it is the logical union). An example would be PyMOL>select sele, symbol N or symbol O, which would create an atom selection containing all of the oxygen and nitrogen atoms in the object.

PyMOL>color [color], [selection]
This colors the selection to the [color]. The GUI can be used to determine which colors are available, and then this command can be used to then chose a particular color (by name).

PyMOL>util.cbag [object]
This colors the atoms of the [object] with carbon = green, oxygen = red, and nitrogen = blue.

PyMOL>show [representation], [selection]
This shows the representation of the selection. Note that it just adds the representation to shown representations, it does not remove representations. Use the GUI to find the different available representations then use this as a quick method to get back to that representation.

PyMOL>hide [representation], [selection]
This hides the representation of the molecule. Note that it just removes the one representation. A common command that one might use is hide everything, [selection]. This removes all the representations from the active display, giving you a clean slate to work with.

PyMOL>bg color [representation]
This sets the background color, and is mostly used to set the background to white for making images for presentations and papers. Many people find a black background more visually appealing and simpler to work with for 'zoomed-in', detailed analysis of a molecular scene (better contrast); a white background is often used at a more global level (at the level of protein chains in an oligomer) and is almost always used for final rendering for purposes of a manuscript, poster, presentation, etc. (less ink used in printing a poster with molecular graphics on white backgrounds).

PyMOL>set [name], [value], [selection]
This is a subtle and highly flexible command that is can be used to vary literally any of PyMOL's hundreds (to thousands) of parameter settings. Some useful particularly useful settings to consider modifying/customizing are noted below.

PyMOL>set transparency, [value], [selection]
This adjusts the transparency of any surfaces that are rendered (whether they are actively showing or hidden). The value ranges between 0 (full opacity; the default) and 1 (full transparency).

PyMOL>set surface color, [color], [selection]
This changes the color of the surface for the named atom selection.

PyMOL>set sphere transparency, [value], [selection]
Same as transparency (above), but adjusts the opacity of any sphere representations, instead of surfaces.

PyMOL>ray [width]
This initiates ray-tracing of the molecular scene that is actively visible in the viewer window, yielding high-quality, photorealistic images.
[width] specifies the width (in pixels) of the final ray-traced output image (which is written to disk via the 'png' command).
For more information of molecular visualization and graphics, you can see "An Introduction to Biomolecular Graphics" by Mura et al. [1] Note that if any of the results obtained via the procedure described here are included in later work, then the convention is that you will need to cite the software used -e.g., AutoDock Vina, the AutoDock PyMOL plugin, AutoDockTools-4 (which operates behind the scenes in much of what was described above), and PyMOL. The appropriate references are [2,3,4,5].