| Home
syllabus
general information
homework
lectures
websites
Will Terzaghi's Homepage
Membership
Login |
|
|
|
week 2 homework
Biology 398INA: Topics in Bioinformatics
Homework # 2: Using Data mining and Sequence alignment programs
Due January 24, 2003
Please send me your answers by email. You can either create a
new file, or download the ms word file and type in your answers.
week 2 homework.doc
Part I: Learning about Data Mining
- What is "knowledge discovery?"
- What is "directed knowledge discovery?"
- Please give an example of directed knowledge discovery.
- What is "undirected knowledge discovery?"
- Please give an example of undirected knowledge discovery.
- What tools for data mining are available at NCBI? http://www.ncbi.nlm.nih.gov
- Go to http://industry.ebi.ac.uk/~brazma/dm.html and click on "ISMB98
tutorial slides." What conclusions did the authors make?
- Go to http://bioinformatics.weizmann.ac.il/cards/knowledge.html.
- Why is software for data mining sometimes called "siftware?"
- What are the stages of knowledge discovery?
- Go to http://www.digimine.com/usama/datamine/
- What are the titles of the articles published in the January 2003
(volume 7, issue 1) issue of "Data Mining and Knowledge Discovery"
- Go to http://www.kdnuggets.com/solutions/bioinformatics.html
- Click on Aber Genomic Computing, then on its products. How does evolutionary
computing work?
- Pick 2 other links listed on the "bioinformatics solutions"
page and tell me what they do.
- Return to http://www.kdnuggets.com/solutions/bioinformatics.html and
click on "Companies in Bioinformatics ."
- What is the lead clinical candidate being tested by Genome Therapeutics?
- Pick 2 other companies listed on this page (besides Human Genetic
Sciences, Incyte and Incellico) and tell me what they do.
- Go to http://www.incyte.com/ and click on "Focus: Learn how genomics
is advancing research into therapeutic antibodies."
- What is the key to developing antibody therapies?
- Go to http://www.incellico.com/
- How does CELL TM break the informatics bottleneck?
- Go to http://www.gene.com/gene/research/biotechnology/genomics.jsp
- Why is Genentech focusing on secreted proteins?
- How many genes has it sequenced?
- Go to http://www.millennium.com/ and click on "R&D,"
then on "R&D Engine,"then on "technology platform."
- How does Millenium use bioinformatics?
- What other technologies are listed in its technology platform?
- Please list three accomplishments of its Metabolic disease discovery
research.
- Go to http://www.hgsi.com/ and click on "technology."
- What is the basis of its functional proteomics program?
- What other technologies are they developing?
Part II: Learning about Sequence Alignment
- Go to http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/similarity.html
- What is the basic premise underlying similarity searching?
- What is the general approach used for similarity searching?
- How do we quantify similarities?
- Now go to http://www.inf.ethz.ch/personal/cannaroz/courses/compbio/week2/week2/week2.html
- What is an alignment?
- What is the underlying assumption if 2 sequences are aligned?
- How do we align?
- What are the key features of the Markovian Model of evolution?
- How do we score an alignment?
- What does PAM stand for?
- Now go to http://www.psc.edu/biomed/training/tutorials/sequence/db/index.html
- Why is database searching different from sequence alignment?
- Why is database searching comparable to a laboratory experiment?
- What are the sources of prior knowledge in database searching?
- How does Smith-Waterman differ from Needleman-Wunsch?
- Why is Smith-Waterman more sensitive than FASTA or BLAST?
- How does FASTA differ from Smith-Waterman?
- How does BLAST differ from Smith-Waterman?
- What conclusions do the authors come up with for database searching?
Part III: Using Smith-Waterman and FASTA
- Copy the BRCA1 query sequence from the week 2 websites page (This
is the one letter code for part of the protein encoded by the BRCA1 gene,
a tumor suppressor that repairs DNA damge).
- Go to http://www.ddbj.nig.ac.jp/ and click on "homology search"
- Select S&W search. Under "program" select swp
(protein query vs protein data base)
- Under database select "default"
- Scroll down to QUERY SEQUENCE NAME and enter BRCA1
- Paste the BRCA1 sequence into the "copy and paste"
window
- Enter your email address in the mail address window.
- scroll down to "detailed option."
- How many protein matrices can you choose?
- Select Blosum 30, then send
- How many hits do you get?
- Repeat this process, except this time select Blosum 62
- How many hits do you get?
- Repeat this process, except this time select Blosum 90
- How many hits do you get?
- Return to http://www.ddbj.nig.ac.jp/ and click
on "homology search"
- Select FASTA search. Under "program" select FASTA
(protein query vs protein data base)
- Under database select "default"
- Scroll down to QUERY SEQUENCE NAME and enter BRCA1
- Paste the BRCA1 sequence into the "copy and paste"
window
- Enter your email address in the mail address window.
- scroll down to "detailed option."
- How many protein matrices can you choose?
- Select Blosum 50, then send
- How many hits do you get?
- Repeat this process, except this time select Blosum 62
- How many hits do you get?
- Repeat this process, except this time select PAM250
- How many hits do you get?
- Go to http://decypher.stanford.edu/index_by_algo.htm and click on
"Smith-Waterman Protein-Protein"
- choose the Swiss-Prot Release 40.32 database
- paste the BRCA1 query sequence into the paste window
- how many weight matrices can you choose from?
- Select Blosum 30, then send
- How many hits do you get?
- Repeat this process, except this time select Blosum 90
- How many hits do you get?
- Repeat this process, except this time select PAM 10
- How many hits do you get?
- Repeat this process, except this time select PAM 500
- How many hits do you get?
- Go to http://workbench.sdsc.edu/ and set up an account if you haven't
already done this.
- select "protein tools,' then "add new sequence"
and click "run"
- You will get a new window. Under "label" enter "BRCA1,"
then paste your sequence into the "sequence" window and click
"save."
- return to the "protein tools" page. You should see
a new entry labeled
BRCA1." Click the box next to it, then select SSearch and click "run".
- You will get a new window. Select "run as batch" and
"non-redundant protein database." Leave everything else alone,
then click submit.
- How many hits do you get?
- return to the "protein tools" page.
- select FASTA, then "submit."
- You will get a new window. Select "run as batch" and
"non-redundant protein database." Leave everything else alone,
then click submit.
- How many hits do you get?
- Of these three websites, which was the easiest to use?
- Which was the most powerful?

|