Course: Algorithms for Biological Sequence Analysis

Fall semester, 2006

Tuesday 10:20 – 13:10, 111 CSIE Building.

3 credits

Web site: http://www.csie.ntu.edu.tw/~kmchao/seq06fall

Instructor: Kun-Mao Chao (趙坤茂)

Teaching assistant: Hung-Lung Wang (王弘倫) email: f92085@csie.ntu.edu.tw

 

Prerequisites: Some basic knowledge on algorithm development and program design is required. Background in bioinformatics and computational biology is welcome but not required for taking this course.

 

Coursework:

Homework assignments and Class participation (15%)

Two midterm exams (30% each): November 7, 2006 and December 19, 2006 (tentatively)

        Midterm #1 (Nov. 7, 2006)

Oral presentation of selected papers (25%)

 

Our classmates: I  II  III  IV  V

 

Class PowerPoint slides:

    First Class (Sept. 19, 2006)

    Dynamic Programming -- a Quick Review (Sept. 26, 2006)
    Sequence Alignment (I, Smith-Waterman) (Sept. 26 and Oct. 3, 2006)
    Sequence Alignment (II, Various scoring scheme) (Oct. 17 and Oct. 24, 2006)
    Suboptimal Alignment (Delta points) (Oct. 31, 2006)
    Sequence_Alignment_(III, FASTA & BLAST) (Nov. 14, 2006)
    Heaviest Segments in a Number Sequence (Dec. 5, 2006)
    RMSQ (Dec. 5, 2006)

    SNP (Dec. 12, 2006)

 

Supporting materials: (Read 1~7 for the first midterm exam)

  1. Basic Concepts of DNA, Proteins, Genes and Genomes -- Kun-Mao Chao

  2. Dynamic Programming -- a Quick Review --Kun-Mao Chao

  3. Sequence Alignment --Kun-Mao Chao

  4. A Note on the Scoring Schemes -- Kun-Mao Chao

  5. Chao, K. -M., Pearson, W. R. and Miller, W. , 1992, Aligning Two Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 8: 481-487

  6. Chao, K. -M., 1994, Computing All Suboptimal Alignments in Linear Space, Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807, 31-42, California, USA.

  7. Chao, K. -M., Hardison R. C. and Miller, W. , 1994, Recent Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291.

  8. Heaviest Segments in a Number Sequence --Kun-Mao Chao

  9. RMSQ -- Kuan-Yu Chen and Kun-Mao Chao

  10. Huang, Y.-T., Zhang, K., Chen, T. and Chao, K.-M., 2005, “Selecting Additional Tag SNPs for Tolerating Missing Data in Genotyping,” BMC Bioinformatics, 6: 263.

  11. Huang, Y.-T., Chao, K.-M., and Chen, T., 2005, “An Approximation Algorithm for Haplotype Inference by Maximum Parsimony,” Journal of Computational Biology, 12: 1261-1274.

  12. Chang, C.-J., Huang, Y.-T., and Chao, K.-M., 2006, “A Greedier Approach for Finding Tag SNPs,” Bioinformatics, 22: 685-691.

  13. The Human Project and Beyond

  14. Basic Local Alignment Search Tool -- SF Altschul, W Gish, W Miller, EW Myers, DJ Lipman
    (J Mol Biol. 215(3):403-410, 1990)

  15. Gapped BLAST and PSI-BLAST -- SF Altschul, TL Madden, AA Schaffer, J Zhang, Z Zhang, W Miller and DJ Lipman
    (Nucleic Acids Research, 25(17): 3389-3402, 1997)

  16. CLUSTAL W -- J D Thompson, D G Higgins, and T J Gibson
    (Nucleic Acids Res. 22(22): 4673–4680, 1994)

Class presentations:

1.      Maximum number of team members: 6

2.      Each member is required to present in turn;

3.      Revised slides should be sent to me one week after the presentation; (Please compress your figures.)

4.      Questions in class are always welcome;

Selected papers for presentation:

  1. (November 21, 2006)
    S.F. Altschul, T. L. Madden, A.A. Sch{\"{a}}ffer, J. Zhang, Z. Zhang, W. Miller and D. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25 (1997), 3389--3402.

    Plus an introduction to NCBI.

    李定達、曾文鴻、鄭為正、佘健生

    Slides
     
  2. (November 28, 2006)
    Bin Ma, John Tromp, Ming Li. PatternHunter: faster and more sensitive homology search. Bioinformatics, 18(3):440-445. March 2002.
    [ Abstract ] [ 02ph.pdf, 310Kb ] [ BibTeX ] [ PubMed ]

    Ming Li, Bin Ma, Derek Kisman, John Tromp. PatternHunter II: Highly Sensitive and Fast Homology Search. Journal of Bioinformatics and Computational Biology, 2(3):417-439. 2004. Early version in GIW 2003..
    [ Abstract ] [ 04ph2.ps, 341Kb ] [ Download from publisher ] [ BibTeX ]

    Derek Kisman, Ming Li, Bin Ma, Li Wang. tPatternHunter: gapped, fast and sensitive translated homology search. Bioinformatics, 21(4):542-544. February 2005.
    [ Abstract ] [ Download from publisher ] [ BibTeX ] [ PubMed ]

    洪錫全、郭立翔、張智翔、鍾承宏、王凱平、莊謹譽

    Slides  Suggested_problems
     
  3. (December 26, 2006)
    W. James Kent
    BLAT-The BLAST-Like Alignment Tool
    Genome Res. 2002 12: 656-664. Published in Advance March 20, 2002, 10.1101/gr.229202. Article published online before March 2002 [Abstract] [Full Text]

    Scott Schwartz, W. James Kent, Arian Smit, Zheng Zhang, Robert Baertsch, Ross C. Hardison, David Haussler, and Webb Miller
    Human-Mouse Alignments with BLASTZ
    Genome Res. 2003 13: 103-107. [Abstract] [Full Text]  

    楊伍隆、李孟韓、陳韋仰、王煜樟

    Slides

     
  4. (January 2, 2007)
    Alignment of Whole Genomes  (148K, PDF format) by A.L. Delcher, S. Kasif, R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg.  Nucleic Acids Research, 27:11 (1999), 2369-2376.

    Fast Algorithms for Large-scale Genome Alignment and Comparison.  A.L. Delcher, A. Phillippy, J. Carlton, and S.L. Salzberg.  Nucleic Acids Research (2002), Vol. 30, No. 11 2478-2483

    The MUMmer Homepage

    游騰楷、曾俊雄、王慧芬、杜海倫

    Slides
     
  5. (January 9, 2007)
    Michael Brudno, Chuong B. Do, Gregory M. Cooper, Michael F. Kim, Eugene Davydov, NISC Comparative Sequencing Program, Eric D. Green, Arend Sidow, and Serafim Batzoglou
    LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA
    Genome Res., Apr 2003; 13: 721 - 731 ; doi:10.1101/gr.926603

    Michael Brudno, Alexander Poliakov, Asaf Salamov, Gregory M. Cooper, Arend Sidow, Edward M. Rubin, Victor Solovyev, Serafim Batzoglou, and Inna Dubchak
    Automated Whole-Genome Multiple Alignment of Rat, Mouse, and Human
    Genome Res., Apr 2004; 14: 685 - 692 ; doi:10.1101/gr.2067704

    藍永倫、陳婕妤、文國煒、宗灝、戴邦炘

    Slides
     
References (Recommended, but not required):
1. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, by Dan Gusfield (1997)
2. Biological Sequence Analysis, by Richard Durbin et al. (1998)
3. Computational Molecular Biology: An Algorithmic Approach, by Pavel Pevzner (2000)
4. An Introduction to Bioinformatics Algorithms, by Neil C. Jones and Pavel Pevzner (2004)
5. Related journal and conference papers
 
Some smart guys in this field (biological sequence analysis) you might wish to know:
Webb C. Miller and Miller Lab.
Michael S. Waterman
Russell F. Doolittle
Pavel Pevzner
Eugene W. Myers
David J. Lipman
Stephen F. Altschul
William R. Pearson
Samuel Karlin
Eric S. Lander
David Sankoff
Gary D. Stormo
Daniel M. Gusfield
Warren Gish
Vineet Bafna
Minoru Kanehisa (金久  )
Ming Li (李明)
Tao Jiang (姜濤)
Xiaoqiu Huang (黃曉秋)
Louxin Zhang (張洛欣)
Ting Chen (陳梃)
… to be continued.
Kun-Mao Chao (趙坤茂; 不好意思 互相稱讚求進步~~~)
 
Useful links:
  1. NCBI    
  2. NCBI Education
  3. Molecular Biology for Computer Scientists by Lawrence Hunter
  4.  Developing Bioinformatics Computer Skills by Cynthia Gibas & Per Jambeck
  5. Sense from Sequences: Stephen F. Altschul on Bettering BLAST , July/August 2000 (A story about BLAST)
  6. Initial sequencing and analysis of the human genome (15 February 2001 Nature 409, 860 - 921 (2001)
  7.  The Sequence of the Human Genome (Science 2001 February 16; 291: 1304-1351)
  8. An evaluation of the draft human genome sequence (Nature Genetics 29, 88 - 91 (01 Sep 2001) Letters)
  9.  Initial sequencing and comparative analysis of the mouse genome (Nature 420, 520 - 562 (2002))
  10. The UCSC Genome Browser
  11. The Chimpanzee Genome (Nature; Sept. 1, 2005)
  12. The Chimpanzee Sequencing and Analysis Consortium25, "Initial sequence of the chimpanzee genome and comparison with the human genome," Nature 437, 69-87 (1 September 2005) | doi: 10.1038/nature04072
  13. The HapMap Project (Nature; Oct. 27, 2005)
    Please visit:
    http://www.nature.com/nature/journal/v437/n7063/index.html
     
    and in particular the following research articles:
    http://www.nature.com/nature/journal/v437/n7063/index.html#Article
Bibliography links: