bioinformatics

 
Home

syllabus

general information

homework

lectures

websites

Will Terzaghi's Homepage



Membership

Login

 
 

week 2 homework

Biology 398INA: Topics in Bioinformatics
Homework # 2: Using Data mining and Sequence alignment programs
Due January 24, 2003

Please send me your answers by email. You can either create a new file, or download the ms word file and type in your answers.
week 2 homework.doc


Part I: Learning about Data Mining

  1. What is "knowledge discovery?"
  2. What is "directed knowledge discovery?"
  3. Please give an example of directed knowledge discovery.
  4. What is "undirected knowledge discovery?"
  5. Please give an example of undirected knowledge discovery.
  6. What tools for data mining are available at NCBI? http://www.ncbi.nlm.nih.gov
  7. Go to http://industry.ebi.ac.uk/~brazma/dm.html and click on "ISMB98 tutorial slides." What conclusions did the authors make?
  8. Go to http://bioinformatics.weizmann.ac.il/cards/knowledge.html.
    • Why is software for data mining sometimes called "siftware?"
    • What are the stages of knowledge discovery?
  9. Go to http://www.digimine.com/usama/datamine/
    • What are the titles of the articles published in the January 2003 (volume 7, issue 1) issue of "Data Mining and Knowledge Discovery"
  10. Go to http://www.kdnuggets.com/solutions/bioinformatics.html
    • Click on Aber Genomic Computing, then on its products. How does evolutionary computing work?
    • Pick 2 other links listed on the "bioinformatics solutions" page and tell me what they do.
  11. Return to http://www.kdnuggets.com/solutions/bioinformatics.html and click on "Companies in Bioinformatics ."
    • What is the lead clinical candidate being tested by Genome Therapeutics?
    • Pick 2 other companies listed on this page (besides Human Genetic Sciences, Incyte and Incellico) and tell me what they do.
  12. Go to http://www.incyte.com/ and click on  "Focus: Learn how genomics is advancing research into therapeutic antibodies."
    • What is the key to developing antibody therapies?
  13. Go to http://www.incellico.com/
    • How does CELL TM break the informatics bottleneck?
  14. Go to http://www.gene.com/gene/research/biotechnology/genomics.jsp
    • Why is Genentech focusing on secreted proteins?
    • How many genes has it sequenced?
  15. Go to http://www.millennium.com/ and click on "R&D," then on "R&D Engine,"then on "technology platform."
    • How does Millenium use bioinformatics?
    • What other technologies are listed in its technology platform?
    • Please list three accomplishments of its Metabolic disease discovery research.
  16. Go to http://www.hgsi.com/ and click on "technology."
    • What is the basis of its functional proteomics program?
    • What other technologies are they developing?

 Part II: Learning about Sequence Alignment

  1. Go to http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/similarity.html
    1. What is the basic premise underlying similarity searching?
    2. What is the general approach used for similarity searching?
    3. How do we quantify similarities?
  2. Now go to http://www.inf.ethz.ch/personal/cannaroz/courses/compbio/week2/week2/week2.html
    1. What is an alignment?
    2. What is the underlying assumption if 2 sequences are aligned?
    3. How do we align?
    4. What are the key features of the Markovian Model of evolution?
    5. How do we score an alignment?
    6. What does PAM stand for?
  3. Now go to http://www.psc.edu/biomed/training/tutorials/sequence/db/index.html
    1. Why is database searching different from sequence alignment?
    2. Why is database searching comparable to a laboratory experiment?
    3. What are the sources of prior knowledge in database searching?
    4. How does Smith-Waterman differ from Needleman-Wunsch?
    5. Why is Smith-Waterman more sensitive than FASTA or BLAST?
    6. How does FASTA differ from Smith-Waterman?
    7. How does BLAST differ from Smith-Waterman?
    8. What conclusions do the authors come up with for database searching?

Part III: Using Smith-Waterman and FASTA

  1. Copy the BRCA1 query sequence from the week 2 websites page (This is the one letter code for part of the protein encoded by the BRCA1 gene, a tumor suppressor that repairs DNA damge).
  2. Go to http://www.ddbj.nig.ac.jp/ and click on "homology search"
    • Select S&W search. Under "program" select swp (protein query vs protein data base)
    • Under database select "default"
    • Scroll down to QUERY SEQUENCE NAME and enter BRCA1
    • Paste the BRCA1 sequence into the "copy and paste" window
    • Enter your email address in the mail address window.
    • scroll down to "detailed option."
    • How many protein matrices can you choose?
    • Select Blosum 30, then send
    • How many hits do you get?
  3. Repeat this process, except this time select Blosum 62
    • How many hits do you get?
  4. Repeat this process, except this time select Blosum 90
    • How many hits do you get?
  5. Return to http://www.ddbj.nig.ac.jp/ and click on "homology search"
    • Select FASTA search. Under "program" select FASTA (protein query vs protein data base)
    • Under database select "default"
    • Scroll down to QUERY SEQUENCE NAME and enter BRCA1
    • Paste the BRCA1 sequence into the "copy and paste" window
    • Enter your email address in the mail address window.
    • scroll down to "detailed option."
    • How many protein matrices can you choose?
    • Select Blosum 50, then send
    • How many hits do you get?
  6. Repeat this process, except this time select Blosum 62
    • How many hits do you get?
  7. Repeat this process, except this time select PAM250
    • How many hits do you get?
  8. Go to http://decypher.stanford.edu/index_by_algo.htm and click on "Smith-Waterman Protein-Protein"
    • choose the Swiss-Prot Release 40.32 database
    • paste the BRCA1 query sequence into the paste window
    • how many weight matrices can you choose from?
    • Select Blosum 30, then send
    • How many hits do you get?
  9. Repeat this process, except this time select Blosum 90
    • How many hits do you get?
  10. Repeat this process, except this time select PAM 10
    • How many hits do you get?
  11. Repeat this process, except this time select PAM 500
    • How many hits do you get?
  12. Go to http://workbench.sdsc.edu/ and set up an account if you haven't already done this.
    • select "protein tools,' then "add new sequence" and click "run"
    • You will get a new window. Under "label" enter "BRCA1," then paste your sequence into the "sequence" window and click "save."
    • return to the "protein tools" page. You should see a new entry labeled
      BRCA1." Click the box next to it, then select SSearch and click "run".
    • You will get a new window. Select "run as batch" and "non-redundant protein database." Leave everything else alone, then click submit.
    • How many hits do you get?
  13. return to the "protein tools" page.
    • select FASTA, then "submit."
    • You will get a new window. Select "run as batch" and "non-redundant protein database." Leave everything else alone, then click submit.
    • How many hits do you get?
  14. Of these three websites, which was the easiest to use?
  15. Which was the most powerful?




Last update: Tuesday, January 21, 2003 at 10:33:12 AM.