bioinformatics

 
Home

syllabus

general information

homework

lectures

websites

Will Terzaghi's Homepage



Membership

Login

 
 

week 5 homework

Biology 398INA: Topics in Bioinformatics

 Homework # 5

Using Molecular Phylogeny programs

Due February16, 2003

Please send me your answers by email. You can either create a new file, or download the ms word file and type in your answers.

week5homework.doc

Part I:  Learning about molecular phylogenies

  1. What is the basic assumption underlying a molecular phylogeny?
  2. Why must we distinguish between gene trees and species trees?
  3. Why don't genes always evolve by a series of bifurcations (i.e., by a series of single base changes)
  4. What are the four steps to constructing a molecular phylogeny?
  5. What is an orthologous sequence?
  6. What is a paralogous sequence?
  7. Which type of sequences should you use for a species phylogeny?
  8. What is the difference between multiple sequence alignments to discover motifs, etc, vs for constructing phylogenies?
  9. Why is ClustalW not a very good choice for constructing species phylogenies?
  10. Go to http://www.umanitoba.ca/faculties/afs/plant_science/courses/39_769/lec07/lec07.1.html
    • What is the most important assumption in any phylogenetic model?
    • What is a second factor that is critical to phylogenetic analysis?
    • What happens if a multiple alignment is poor?
    • What is the best way to deal with parts of an alignment that are uncertain due to gaps?
    • What sorts of phylogenies are best constructed using DNA sequence alignments?
    • What sorts of phylogenies are best constructed using protein alignments?
    • What sorts of phylogenies are best constructed using ribosomal RNA sequence alignments?
    • What is a homoplasy?

Part II:  Sequence alignments

  1. What factors must you take into account when aligning DNA sequences for constructing molecular phylogenies?
  2. What is the difference between the Jukes and Cantor model and more sophisticated models for DNA substitutions (such as the Kimura model)?
  3. What factors must you take into account when aligning protein sequences for constructing molecular phylogenies?

Part III:  Tree building

  1. Why can't we simply construct all possible trees, score each one then pick the one with the best score?
  2. What are the three general approaches used to reduce the number of trees to consider?

Part IV: Distance matrix methods

  1. What is the general approach used by distance matrix methods to construct a phylogeny?
  2. What are the main differences between UPGMA and neighbor-joining methods?
  3. What is the difference between these methods and the Fitsch/Margoliash method? (you may want to read http://www.umanitoba.ca/faculties/afs/plant_science/courses/39_769/lec07/lec07.2.html to answer this one).
  4. Go to the biologist's workbench (http://workbench.sdsc.edu/), select "alignment tools" then "CLUSTALTREE" and click "help."
    • What sort of tree does CLUSTALTREE generate?
    • What method does it use?
    • How can you infer the root of the tree?
  5. Now, select your CLUSTALW alignment from last week's assignment (it should be stored in your alignment tools folder since you used it last week for your boxshade output) and click "run."
    • What sort of output do you get?
    • What does it mean?
  6. Now, select PROTDIST under "alignment tools" and click "help."
    • What does this program do?
    • What are the three methods it uses for amino acid substitutions?
    • What is the "Categories distance?"
  7. Analyze your CLUSTALW alignment with PROTDIST
    • What sort of output do you get?
    • What does it mean?
  8. Now, select DRAWTREE under "alignment tools" and click "help."
    • What sort of tree does DRAWTREE generate?
    • What method does it use?
  9. Now analyze your CLUSTALW data with DRAWTREE , take a shot of the results and attach it to your homework.
  10. Select DRAWGRAM under "alignment tools" and click "help."
    • What sort of tree does DRAWGRAM generate?
    • What method does it use?
    • How does it decide where to place the root of the tree?
  11. Analyze your CLUSTALW data with DRAWGRAM, take a shot of the results and attach it to your homework.
Part V: Maximum parsimony methods
  1. What is the key assumption of maximum parsimony methods?
  2. How does this differ from distance matrix methods?
  3. What are the advantages of maximum parsimony methods?
  4. What are the disadvantages of maximum parsimony methods?
  5. Go to the biologist's workbench (http://workbench.sdsc.edu), select "alignment tools" then "DNAPARS" and click "help."
    • What does this program do?
    • What are the assumptions of this method?
  6. Go to the week 5 websites and add the DNA sequences into your Biologist's workbench  "Nucleic tools" folder.
  7. Align these sequences using CLUSTAL W, then click on "import alignments."
    • Your computer should go to the "alignment tools" window (if not, go there manually).
    • Select your "CLUSTALW-nucleic" alignment, "DNAPARS" and click "run." Note that you have the option of randomizing the order of input sequences.
    • What sort of output do you get?
    • What is the matrix below the most parsimonious tree?
    • Take a screenshot of your most parsimonious tree and attach it to your homework.
  8. Now select "PROTPARS" and click "help."
    • What does this program do?
    • What is the difference between the Eck & Dayhoff and Fitch methods for scoring amino acid substitutions?
    • What are the assumptions of the PROTPARS method?
  9. Now analyze your CLUSTALW alignment with PROTPARS
    • What sort of output do you get?
    • What is the matrix below the most parsimonious tree?
    • Take a screenshot of your most parsimonious tree and attach it to your homework.
    • Does this tree differ from the trees generated by any of the distance matrix programs?

Part VI: Maximum likelihood methods

    • What are the assumptions of DNAML?
    • What was the important new development in the 3.5 release?
  1. Now go to http://www.psc.edu/general/software/packages/phylip/manual/dnamlk.html
    • What does DNAMLK do?
    • How is DNAMLK related to DNAML?
    • What are the assumptions of DNAMLK?
  2. Now go to http://www.psc.edu/general/software/packages/phylip/manual/protml.htm
    • What is PROTML?
    • Why isn't DNAML suited to protein sequence data?
    • Why can the inclusion of the third base in a codon be misleading when constructing phylogenetic trees?
    • What is the strategy taken by DNAML?
    • How does Felstenstein recommend that you compensate for the dependence on the order of species inputs?
    • What is "star decomposition"
  3. Go to the biologist's workbench (http://workbench.sdsc.edu/), select "alignment tools" then select your "CLUSTALW-nucleic" alignment, "view aligned sequences?" and click "run."
    • In the new window, under "Format" select "Phylip interleaved" and wait for it to reformat itself.
    • Copy the text below "Ambiguous characters" from the first number (usually 6) to the very last character of the alignment
    • In the left-hand window click on "DNA" under the "4. Phylogeny Methods for DNA" window.
    • Click on "run" beneath "4. Max. Likelihood."
    • Paste your copied sequence from workbench into the Input Sequences window. Be sure that input sequence is specified as "interleaved," then click "submit."
    • Your output will appear in the upper right window.  Please copy the output tree and add it to your homework.
    • How many trees were examined?
    • In the left window, click on "Draw trees"
    • Click on "Run" below "1. Draw Cladograms."
    • Select "X-bit format" under "output format," and "yes" under "use tree file from last stage?" then click "submit."
    • Your output will appear in the upper right window.  Please copy the output tree and add it to your homework.
Part VII: Tree evaluation
  1. What are the three basic ways to resample the data for tree-building?
  2. What is jackknife resampling?
  3. What is bootstrap resampling?
  4. How does it differ from jackknife resampling?
  5. Go to the biologist's workbench (http://workbench.sdsc.edu/), select "alignment tools" then "CLUSTALTREE" and click "help."
    • What is bootstrapping?
    • What happens if you use the same seed number for different runs?
  6. Now, select your CLUSTALW alignment from last week's assignment and click "run." Under bootstrap options select compute: "Bootstrap tree", leave everything else alone, and click "submit."
  7. Copy the "Raw Phylip format" tree from the first parenthesis to the last semicolon
    • In the left window click on "7. Plot Trees."
    • Now click on "run" beneath "3. Draw Phylogenies."
    • Select "X-bit format" under "output format,"and "no" under "use tree file from last stage?"
    • Paste your file from "biologist's workbench," then click "submit."
    • Your output will appear in the upper right window.  Please copy the output tree and add it to your homework.
Part VIII: Other programs
  1. Select two  different programs, tell me who wrote them, what they do, and how to get them.




Last update: Tuesday, February 11, 2003 at 11:09:55 PM.