Course: Algorithms for Biological Sequence Analysis

Fall semester, 2008

Wednesday 13:20 - 16:20, 107 CSIE Building.

3 credits

Web site: http://www.csie.ntu.edu.tw/~kmchao/seq08fall

Instructor: Kun-Mao Chao (趙坤茂)

Teaching assistant: Yi-Ching Chen (陳怡靜)  email: d94010@csie... (TA office hours: 10:20~12:10 Wednesday)

 

Prerequisites: Some basic knowledge on algorithm development and program design is required. Background in bioinformatics and computational biology is welcome but not required for taking this course.

 

Coursework:

    Homework assignments and Class participation (15%)

    Two midterm exams (60%; 30% each):

        1. October 29, 2008

        2. December 17, 2008

    Oral presentation of selected papers (25%)

 

Our classmates: I  II  III

 

Class PowerPoint slides:

    Introduction (Sept. 17, 2008)

    Interesting Sequences (Sept. 17, 2008)

    Basic Algorithmic Strategies (Sept. 24, 2008)

    Sequence Alignment (Oct. 1, 2008 and Oct. 8, 2008)

    Space-Saving Strategies (Oct 8, 2008 and Oct 15, 2008)

    Suboptimal Alignments (Oct. 15 and 22, 2008)

    Homology Search Tools (Nov. 5, 2008)
    Multiple Sequence Alignment (Nov. 12, 2008)

    SNP (Nov. 19, 2008 and Nov. 26, 2008)

    Heaviest Segments (Nov. 26, 2008)

    RMSQ (Dec. 3, 2008)

 

Supporting materials:

  1. Molecular Biology for Computer Scientists by Lawrence Hunter

  2. Basic Algorithmic Strategies by Kun-Mao Chao

  3. Global Alignment by Kun-Mao Chao

  4. Local Alignment by Kun-Mao Chao

  5. Various Scoring Schemes by Kun-Mao Chao

  6. Space-Saving Strategies by Kun-Mao Chao

  7. Chao, K. -M., Pearson, W. R. and Miller, W., 1992, Aligning Two Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 8: 481-487

  8. Chao, K. -M., 1994, Computing All Suboptimal Alignments in Linear Space, Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807, 31-42, California, USA.

  9. Chao, K. -M., Hardison R. C. and Miller, W., 1994, Recent Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291.

  10. Homology Search Tools by Kun-Mao Chao

  11. Multiple Sequence Alignment by Kun-Mao Chao

  12. ClustalW

  13. Huang, Y.-T., Zhang, K., Chen, T. and Chao, K.-M., 2005, “Selecting Additional Tag SNPs for Tolerating Missing Data in Genotyping,” BMC Bioinformatics, 6: 263.

  14. Huang, Y.-T., Chao, K.-M., and Chen, T., 2005, “An Approximation Algorithm for Haplotype Inference by Maximum Parsimony,” Journal of Computational Biology, 12: 1261-1274.

  15. Chang, C.-J., Huang, Y.-T., and Chao, K.-M., 2006, “A Greedier Approach for Finding Tag SNPs,” Bioinformatics, 22: 685-691.

  16. A Note on Computing Heaviest Segments by Kun-Mao Chao

  17. Chen, K.-Y. and Chao, K.-M.,  2007, “On the Range Maximum-Sum Segment Query Problem,” Discrete Applied Mathematics, 155: 2043-2052.

 

Homework assignments:

#1: Handout: September 17, 2008; Due: September 29, 2008

    Send TA a PowerPoint page including: (Subject: [HW1] Your student ID number)

   A sample solution

   More sample solutions

   Your solutions

 

#2: Handout: Oct. 22, 2008; Due: Nov. 17, 2008

    Send your source code and execution file to TA (Subject: [HW2] Your student ID number)

    Implement Hirschberg's approach described in Section 1 of the note Space-Saving Strategies. (Figure 4 is the main pseudo-code you need to implement.) Please deliver both optimal global alignment and optimal local alignment. More details.

    Note: The boundary conditions of Figure 4 were corrected on Nov. 5, 2008.

 

Class presentations:

1.      The expected number of team members: 5~6

2.      Each member is required to present in turn;

3.      Revised slides should be sent to me one week after the presentation; (Please compress your figures.)

4.      Questions in class are always welcome;

Selected papers for presentation:

December 24, 2008
Reconstructing the Evolutionary History of Complex Human Gene Clusters
Yu Zhang, Giltae Song, Tomáš Vinař, Eric D. Green, Adam Siepel, and Webb Miller
RECOMB 2008
陳冠宇  廖鳳玉  林蔚茵  莊秋芸  黃雉穎

December 31, 2008
A novel DNA sequence database for analyzing human demographic history
Jeffrey D. Wall, Murray P. Cox, Fernando L. Mendez, August Woerner, Tesa Severson, and Michael F. Hammer
Genome Research; Aug 2008; 18: 1354 - 1361
余相甫  高孟駿  陳翰霖  李孟哲  周秉誼  周揚儒

January 7, 2008
Direct mapping and alignment of protein sequences onto genomic sequence
Osamu Gotoh
Bioinformatics, 1 November 2008; 24: 2438 - 2444.
盧子彬  呂若陽  翁小涵  李建樂  郭建鴻

References (Recommended, but not required):
1. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, by Dan Gusfield (1997)
2. Biological Sequence Analysis, by Richard Durbin et al. (1998)
3. Computational Molecular Biology: An Algorithmic Approach, by Pavel Pevzner (2000)
4. An Introduction to Bioinformatics Algorithms, by Neil C. Jones and Pavel Pevzner (2004)
5. Chao, K.-M. and Zhang, L. (2008) Sequence Comparison: Theory and Methods,” Springer. (210 pages; ISBN 978-1848003194)
6. Related journal and conference papers
 
Some smart guys in this field (biological sequence analysis) you might wish to know:
Webb C. Miller
Michael S. Waterman
Russell F. Doolittle
Pavel Pevzner
Eugene W. Myers
David J. Lipman
Stephen F. Altschul
William R. Pearson
Samuel Karlin
Eric S. Lander
David Sankoff
Gary D. Stormo
Daniel M. Gusfield
Warren Gish
Martin Vingron
Vineet Bafna
Minoru Kanehisa (金久  )
Osamu Gotoh (後藤  修)
Ming Li (李明)
Tao Jiang (姜濤)
Xiaoqiu Huang (黃曉秋)
Louxin Zhang (張洛欣)
Ting Chen (陳梃)
… to be continued.
Kun-Mao Chao (趙坤茂; 不好意思 互相稱讚求進步~~~)
 
Useful links:
  1. NCBI    
  2. NCBI Education
  3. Molecular Biology for Computer Scientists by Lawrence Hunter
  4.  Developing Bioinformatics Computer Skills by Cynthia Gibas & Per Jambeck
  5. Sense from Sequences: Stephen F. Altschul on Bettering BLAST , July/August 2000 (A story about BLAST)
  6. Initial sequencing and analysis of the human genome (15 February 2001 Nature 409, 860 - 921 (2001)
  7.  The Sequence of the Human Genome (Science 2001 February 16; 291: 1304-1351)
  8. An evaluation of the draft human genome sequence (Nature Genetics 29, 88 - 91 (01 Sep 2001) Letters)
  9.  Initial sequencing and comparative analysis of the mouse genome (Nature 420, 520 - 562 (2002))
  10. The UCSC Genome Browser
  11. The Chimpanzee Genome (Nature; Sept. 1, 2005)
  12. The Chimpanzee Sequencing and Analysis Consortium25, "Initial sequence of the chimpanzee genome and comparison with the human genome," Nature 437, 69-87 (1 September 2005) | doi: 10.1038/nature04072
  13. The HapMap Project (Nature; Oct. 27, 2005)
    Please visit:
    http://www.nature.com/nature/journal/v437/n7063/index.html
     
    and in particular the following research articles:
    http://www.nature.com/nature/journal/v437/n7063/index.html#Article
Bibliography links: