| Home
syllabus
general information
homework
lectures
websites
Will Terzaghi's Homepage
Membership
Login |
|
|
|
week 3 homework
Biology 398INA: Topics in Bioinformatics
Homework # 3: Playing with Sequence alignment programs
Due January 31, 2003
Please send me your answers by email. You can either create a new file,
or download the ms word file and type in your answers.
week3.doc
Part I: Learning about scoring matrices (sorry folks, in hindsight I realize
that we should have done this last week before playing with Smith-Waterman and
FASTA).
A. Go to http://workbench.sdsc.edu/; select "protein
tools,' then "view available scoring matrices" and click "run."
B. You will get a new window. Select "PAM10"
in the BLAST format column and click "view matrix."
C. Open a new browser window and repeat steps A and B, except this
time select "PAM 500." (We want to compare the two matrices side
by side).
- What is the highest score in the PAM 10 matrix and which amino
acid pair gets it (remember that each number is the log of the
odds that the amino acid in the row will be substituted for the amino acid
in the column if the two proteins are related)?
- What is the highest score in the PAM 500 matrix and which amino
acid pair gets it?
- Why did you get the same pair for each?
- Why are the scores different?
- What is the lowest score in the PAM 10 matrix and which amino
acid pair gets it? (disregard B, Z, X and *. These are symbols
for missing or unidentified amino acids)
- What is the lowest score in the PAM 500 matrix and which amino
acid pair gets it?
- Why did you get different pairs this time?
- Why are the scores different?
- Why are the "expected scores" different for these two
matrices?
D. Click "return" at the bottom of the PAM 500 window and
select "BLOSUM 100." (Leave PAM 10 open: we want to compare the
two matrices side by side).
- What is the highest score in the Blosum 100 matrix and which amino
acid pair gets it?
- Which amino acid pair gets
the second highest score in the PAM 10 and in the Blosum 100 matrices?
- Why do you get different pairs?
- Why are the scores different?
Part II:using nucleotide sequence alignment programs
A. Go to Entrez (http://www.ncbi.nlm.nih.gov/Entrez), select “nucleotide”
and type in the word “sonic hedgehog.”
B. Select NM_005631. Homo sapiens smoothened homolog
C. Copy bases 781-1080 and use them for a BLASTN query of the non-redundant
GENBANK database (your choice of site, just tell me where you did this.)
- Note that if you do this at Biologist's Workbench you will need
to select the databases to search. I recommend selecting Genbank invertebrates
through to GenBank Rodent sequences (you can select up to 16 at a time). You
will probably need to submit them as a batch.
- Also note that you can do this using MacVector, following the instructions
in the appendix at the end of this document. You will find this feature of
MacVector very useful when we use ClustalW for multiple alignments.
- How many hits did you get using BLASTN?
- How many were statistically significant?
- Now, run the same query using the EST database (dbest).
- How many hits did you get?
- How many were statistically significant?
- Now use FASTA3 (your choice of site) and the non-redundant database.
- How many hits did you get?
- How many were statistically significant?
- Now use Ssearch3 (your choice of site) and the non-redundant database.
- How many hits did you get?
- How many were statistically significant?
- Did any program find significant matches missed by the others?
- Did the order of significant matches vary?
Part III: using translated DNA alignment programs
- Use the first 300 bases of the NM_005631 sequence for
a blastx query using standard settings and nr database.
- How many hits did you get using BLASTX?
- What was wrong?
- Now use bases 781-1080 of the NM_005631
sequence for a blastx query using standard settings and nr database.
- How many hits did you get?
- How many were statistically significant?
- Did you find anything that was missed by BLASTN?
- Now use bases 781-1080 of the NM_005631
sequence for a FASTAY3 query using standard settings and nr database.
- How many hits did you get?
- How many were statistically significant?
- Did you find anything that was missed by FASTA3?
- Did you find anything that was missed by BLASTX?
Part IV: tweaking the settings.
- Note, the BLASTP settings are the extremes available at NCBI. If you
use a different site, try the extreme values and tell me what they were.
- Note, the FASTA settings are the extremes available at EBI. If you use
a different site, try the extreme values and tell me what they were.
- Use amino acids 161-276 (rows 4-5 of the translation on the GenBank
flatfile) for a blastp query using the nr database, and BLOSUM 45, (10,3).
- How many hits did you get?
- How many were statistically significant?
- What was the score of the best hit?
- Now use the same sequence (amino acids 161-276) for a blastp query using
the nr database, and BLOSUM 45, (19,1).
- How many hits did you get?
- How many were statistically significant?
- What was the score of the best hit?
- Why is this score different?
- Did you find anything missed in step 1?
- Now use the same sequence (amino acids 161-276) for a blastp query using
the nr database, and BLOSUM 80, (6,2).
- How many hits did you get?
- How many were statistically significant?
- What was the score of the best hit?
- Why is this score different?
- Did you find anything missed in step 1?
- Now use the same sequence (amino acids 161-276) for a blastp query using
the nr database, and BLOSUM 80, (11,1).
- How many hits did you get?
- How many were statistically significant?
- What was the score of the best hit?
- Why is this score different?
- Did you find anything missed in step 1?
- Now use the same sequence (amino acids 161-276) for a FASTA3 query using
the swall database, and BLOSUM 50, (-18,-8).
- How many hits did you get?
- How many were statistically significant?
- What was the score of the best hit?
- Why is this score different?
- Did you find anything missed in step 1?
- Now use the same sequence (amino acids 161-276) for a FASTA3 query using
the swall database, and BLOSUM 50, (0,0).
- How many hits did you get?
- How many were statistically significant?
- What was the score of the best hit?
- Why is this score different?
- Did you find anything missed in step 1?
Part VI: PSI BLAST
- Note, the PSI-BLAST settings are available at NCBI. If you use a different
site and these are not available, tell me what you used.
- Now use the same sequence (amino acids 161-276) for a PSI-BLAST query
using the nr database, and BLOSUM 45, (10,3).
- How many hits did you get?
- How many were statistically significant?
- What was the score of the best hit?
- Run PSI-BLAST iteration 2
- How many new hits did you get?
- How many were statistically significant?
- What was the score of the first new sequence?
- Did the score of the best hit change?
- Did the order of the best hits change?
- Run PSI-BLAST iteration 3
- How many new hits did you get?
- How many were statistically significant?
- What was the score of the first new sequence?
- Did the score of the best hit change?
- Did the order of the best hits change?
- Did you find any hits not previously identified by BLAST or FASTA?
Part VII: Finding related strucures
A. Go to Entrez (http://www.ncbi.nlm.nih.gov/Entrez), select "structures,"
and type “sonic hedgehog.”
B. Click on 1VHH
C. A new page will appear, with a pink bar labeled "chain."
Click on this bar.
D. A new page will appear, labeled "VAST Structure Neighbors."
Select the first 3 (2BKJ A, 1LBU 2, 1IQO A 1), then click the"View 3D
structure" button. (you will need to have CN3D installed for this to
work).
E. When CN3D opens you will get a 3-dimensional rendering of the
4 superimposed structures, and a 1 dimensional rendering listing the overlapping
residues. I find that the "worms" rendering shortcut under the style
menu works best.
- Please take a screen shot of these two overlaps, and attach it
to your homework (please save it as a jpeg or TIFF)
- How many overlapping regions are there?
- What are these three proteins(2BKJ A, 1LBU 2, 1IQO A 1)?
- Were any identified as relatives by BLAST or FASTA?
F. Now go to combinatorial extension (http://cl.sdsc.edu/ce.html)
G. Under the FIND menu, select "ALL"
H. Type 1VHH in the "specify protein chain" window, then
click on "search database".
- How many hits did you get?
- What sorts of proteins are they?
- Select the first three alignments.
- Click on "Get alignment,", then on “Press to start
Compare3D.”
- Please take a screen shot of the alignments, and attach it to
your homework (please save it as a jpeg or TIFF)
Appendix: Using ENTREZ and BLAST within MacVector
- Start MacVector
- Click on "Database," then select "Internet Entrez search"
from the dropdown menu.
- In the upper left corner select "accession number,"enter "NM_005631,"
(note that you can also search by gene name, organism, etc.) then click "search."
- You should get an entry in the "matches" window, and a
more complete description in the "documents" window. Click on the
"to desk" button.
- A sequence file will open on your desktop. Go to the Database menu
and select "Internet BLAST search."
- A new window will open that allows you to customize your search.
Click on the "more choices" button.
- Note that you have 3 choices of program, many databases and matrices,
many choices for the open gap cost, etc.
- Proceed as in Part II C

|