bioinformatics

 
Home

syllabus

general information

homework

lectures

websites

Will Terzaghi's Homepage



Membership

Login

 
 

Week 10 lecture

Visualizing 3-D molecular structures

The results of X-ray crystallography, NMR, computer modeling, etc are all recorded as files that list the sequence of the molecule and the coordinates of each atom relative to an arbitrary origin near the center of the molecule.

The sequence is very important, as it allows us to decide which amino acid each atom belongs to, which neighboring atoms it is bonded to and what sorts of bonds are formed.

Structural biochemists sometimes call the sequence the “chemical graph” of a molecule

Two basic approaches are used to record 3-D data:

One approach, such as that used by PDB files, is to list the amino acid sequence using the 3 letter code, then give the coordinates for each atom. Some files (such as the example below) also provide information about secondary structure (lines 9-13) and about special features such as disulfide bonds (lines 14-16)

  • The amino acid sequence is called the explicit sequence since it identifies the order of amino acids
  • The list of atoms and their coordinates is called the implicit sequence, since you can deduce who is bonded to who using "chemistry rules"
  • PDB files do not store bonds; instead, they use “chemistry rules” to infer bonds
    • for example, 2 carbons 1.5 Å apart are joined by a single bond
  • programs which interpret PDB files to render 3-D images reconstruct bonds by consulting tables of bond lengths and types for every conceivable pair of bonded atoms.
  • Problems
    • must record each exception to the rules and deal with it on a case-by-case basis
    • many structures are incomplete: i.e. are missing coordinates for some atoms.

snippet from a PDB file (I edited out ~40 lines of comments)

  • Lines 1-4 identify the compound and the source and list the authors
  • Next there were many lines of comments which I have edited out
  • The seqres lines (lines 5-8) give the primary sequence of the protein
  • lines 9-13 provide informatio about secondary structure
  • Lines 14-16 provide information about disulfide bonds.
  • Lines 17-23 provide information about the crystal, the coordinates used and the scale used to convert from the actual coordinates provided by the crystallographer to fractional coordinates (ranging from -1 to + 1)
  • The coordinates for each atom start on line 24
    • column 1 lists the name of the record
    • column 2 lists serial number of the atom
    • column 3 lists the name of the atom
    • column 4 lists the name of the amino acid it is attached to
    • column 5 lists the residue sequence number
    • column 6 lists the X- coordinate
    • column 7 lists the Y- coordinate
    • column 8 lists the Z- coordinate
    • column 9 lists the occupancy of that position
    • column 10 lists the temperature factor; a measure of the confidence in the position of the atom
    • Additional columns may be used to designate the charge on an atom, the elemental symbol, which chain of the molecule this is (when dealing with proteins that have multiple subunits) and for other information
PDB:

mmCIF (macromolecular Chemical Interchange Format) files store data in a similar format, but use a different and more complex set of relational tables

  • Software is available for converting PDB files to mCIF format, and for converting mmCIF to "pseudoPDB" format
  • Many other file formats also use this general approach of supplying a list of coordinates for each atom in a molecule
  • These formats can generally be converted to PDB format

The Molecular Modeling Data Base (MMDB) at NCBI uses a different approach to store 3-D data.

  • MMDB uses a "standard residue dictionary," a record of all the atoms and bonds in proteins and nucleic acids as well as variants found at the front and back ends of proteins and nucleic acids.
  • Software that reads MMDB files uses this dictionary to connect atoms
    • works much faster, since doesn't need to calculate the rules of chemistry
    • Is also more consistent, since doesn't need to interpret rules, and doesn't need to deal with exceptions to the rules (exceptions are included in the dictionary)

Software for rendering 3D images from PDB files

  • General problem: interpreting the molecular coordinates and creating a 3 D image that can be rotated in all directions around the center point
    • many programs also allow you to present the image in various formats such as backbone, space-fill, cartoon, etc.
    • many allow you to color the image according to various criteria
    • many allow you to highlight specific portions of the molecule according to user-specified criteria
  • Many programs have been written for this purpose.
  • One of the most widely used is RASMOL
    • Rasmol reads molecular coordinates from a variety of file types such as PDB, CHARMm, MOL and renders a 3-D image
    • This image can be displayed in a variety of modes, including backbone, ball-and-stick, spacefill and cartoon (for illustrating secondary structure)
    • This image can be displayed in a variety of color schemes to highlight features such as secondary structure, temperature (the confidence in the position of that atom), groups, etc
    • The image can be rotated to view from any angle
    • The image can be printed or exported in a variety of formats
    • There are a variety of sophisticated commands that allow you to look at specific parts of the molecule and display specific features
      • Unfortunately, these require that you type in specific commands. Therefore, learning to take full advantage of RasMol's capabilities takes quite a while
      • Rasmol can only deal with one image at a time
  • Several different programs have modified or enhanced Rasmol
    • CHIME is a web-browser plugin based on RasMol that allows you to render molecular structures within your browser. It also allows you to rotate them, changes their appearance and color scheme in the same way as Rasmol.
    • Protein Explorer is an improved version based on Chime and Rasmol that is more powerful and more user-friendly.
      • Features for highlighting particular regions or particular interactions are now available by pointing and clicking.
      • Animations can be played
      • Evolution of a molecule can be followed using the ConSurf server
      • Two molecules can be viewed and compared at the same time.
  • Swiss-PDB viewer is another powerful program for rendering PDB files
    • It is designed to be used for visualizing and editing the output from SWISS_MODEL, so it contains many powerful features for evaluating portions of the structure and modifying it (and then evaluating the modifications)
      • You can perform interactive Ramachandran-Plot-manipulation
      • You can perform energy minimization on selected portions of the model
    • Everything can be selected from lists and menus. Not as user-friendly as Protein Explorer, but easier than RasMol
    • Only can see wire-frame mode on-screen, but can export images to POVray or Quick-draw 3-D for high-qulity images.
  • Many Unix command-line programs are available

CN3D uses MMDB files

  • Especially important for NMR data:
    RASMOL shows just one image, CN3D shows all potential structures

  • Cn3D simultaneously displays structure, sequence, and alignment.

Modeling protein-ligand interactions

  • General problem: trying to model how proteins interact with other molecules, and how they change shape upon binding.
  • Ligand Docking algorithms model the interactions between a protein of known structure and its ligand (which may be another protein)
    • This is conceptually similar to modeling protein folding, and similar approaches are used
    • Both approaches try to find the conformation with the lowest free energy
  • These procedures start by identifying the binding site within the protein
    • One way to do this is to obtain protein crystals with a substrate or inhibitor bound in the active site.
    • If such crystals are not available, algorithms scan for likely binding sites on the protein
    • Alternatively, infer the binding site by comparing the structures of compounds that are known to interact with the same protein
  • Next step is simulating the annealing process
    • Two general approaches:

    1) Lock and key approaches assume a rigid protein and flexible ligand

    • hold protein shape constant, and alter shape of ligand
    • Use energy minimization, Monte Carlo simulations,genetic algorithms, etc to estimate changes in shape.
    • Advantage: simpler programming

    2) Induced-fit algorithms allow both the protein and the ligand to change shape

    • many start by holding protein shape constant, and alter shape of ligand
    • Then use energy minimization, etc to try to estimate change in protein shape.

Rational Drug Design:applied protein-ligand interactions

Drug design is a 2 step process

  1. Drug discovery
  2. Drug testing

Bioinformatics can't do much (yet) to speed up drug testing, but it can significantly accelerate drug discovery

  • Accelerate target identification
    • metabolic modeling allows us to identify molecules that are essential for infection or proliferation of a pathogen (we will come back to this topic later) and estimate the effect of inhibiting this target. We can also use metabolic modeling to identify the best target in a signalling or metabolic pathway and simulate the effect of inhibiting it
  • Rational drug design
    • design drugs that will specifically bind to the active site (or regulatory sites) within a target protein

General approach

  1. Identify suitable target molecule
  2. Obtain structure of this molecule
  3. Identify the binding site within this molecule
  4. Identify the pharmacophore: the atoms that form the important 3D points of interaction between a drug and its target needed to elicit a response.
  5. Design drugs that will match the features of a pharmacophore and, therefore, fit into the binding site.

We will come back to the question of metabolic simulation later in the course, and we have already dealt with ways to obtain the structure and identify the binding site within the target.

  • One way to identify the pharmacophore is to dock inhibitors and obtain X-ray crystallography or NMR structures
    • This requires very high resolution images!
  • An alternative is to identify a series of compounds acting via the same target and then see what structures they all have in common
    • for an example of this approach see http://dtp.nci.nih.gov/docs/3d_database/pharms/pkcsearch.html
    • used phorbol esters to identify the pharmacophore for Protein Kinase C
    • Many pharmaceutical companies are using combinatorial chemistry to make libraries of chemicals
      • Use high-throughput screening to identify lead chemicals: proteins which show some effect on the target.
      • Once leads have been identified use simulations to identify potential pharmacophores and to eliminate chemicals which overlap
        http://www.netsci.org/Science/Combichem/feature05.html
    • An alternative once the active site on the target proteins has been identified is to screen databases of molecular structures for potential ligands
      • Many databases of structures for small molecules are available
      • Many programs have been developed for rapidly screening these databases for potential ligands using various heuristic approaches
        • FLEXX allows you to search online (once you are registered).
        • SLIDE is another algorithm that allows you to rapidly screen large data bases of potential ligands.
        • Accelrys markets a suite of programs for rational drug design
  • Once the pharmacophore has been identified, many algorithms have been developed to help design better drugs
    • Goals
      • designing drugs that will bind to regions of target proteins that do not change in pathogens such as viruses that evade the immune system by rapidly mutating
      • designing drugs that bind with tailored affinity constants
        • sometimes reversible binding is better than irreversible binding!
      • Chem-X uses defined centers (hydrogen bond acceptor, hydrogen bond donor, positive charge, aromatic, hydrophobic, acid, base) and defined distance intervals to create a set of pharmacophores
      • Many other companies (eg accelrys) offer programs that allow you to design superior drugs once you have identified the binding site and pharmacophore

 

 

 




Last update: Tuesday, October 14, 2003 at 6:31:51 PM.