Course: Algorithms for Biological Sequence Analysis

Spring semester, 2011

9:10 - 12:10 Tuesday, 107 CSIE Building.

3 credits

Web site: http://www.csie.ntu.edu.tw/~kmchao/seq11spr

Instructor: Kun-Mao Chao (趙坤茂)

Teaching assistant: An-Chiang Chu (朱安強)

 

Prerequisites: Some basic knowledge on algorithm development and program design is required. Background in bioinformatics and computational biology is welcome but not required for taking this course.

 

Coursework:

    Homework assignments and Class participation (10%)

    Two midterm exams (70%; 35% each):

        1. April 12, 2011  Scoreboard  I  II  by An-Chiang

        2. May 24, 2011

    Oral presentation of selected papers (20%)

 

Our classmates: I  II (by An-Chiang)
III [Fire alarm. Run!]  IV [An-Chiang, the Master]
V [Thank you for being the background of this camera.] (by Kun-Mao)

 

Class PowerPoint slides:

    Introduction [2/22/2011]

    Interesting Sequences [2/22/2011]

    Basic Algorithmic Strategies [3/1/2011]

    Sequence Alignment [3/8/2011][Thank Chih-Hsin for his corrections.]

    Space-Saving Strategies [3/15/2011]

    Suboptimal Alignments [3/22/2011]

    Multiple Sequence Alignment [3/29/2011]

    Homology Search Tools [4/19/2011; 4/26/2011]

    SNP [4/26/2011; 5/10/2011]

    Heaviest Segments [5/10/2011; 20115/17]

    RMSQ [2011/5/17]

 

Supporting materials:

  1. Molecular Biology for Computer Scientists by Lawrence Hunter

  2. Basic Algorithmic Strategies by Kun-Mao Chao

  3. Global Alignment by Kun-Mao Chao

  4. Local Alignment by Kun-Mao Chao

  5. Various Scoring Schemes by Kun-Mao Chao

  6. Space-Saving Strategies by Kun-Mao Chao

  7. Chao, K. -M., Pearson, W. R. and Miller, W., 1992, Aligning Two Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 8: 481-487.

  8. Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Constrained Sequence Alignment, Bulletin of Mathematical Biology, 55: 503-524.

  9. Chao, K. -M., 1994, Computing All Suboptimal Alignments in Linear Space, Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807, 31-42, California, USA.

  10. Chao, K. -M., Hardison R. C. and Miller, W., 1994, Recent Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291.

  11. Chao, K.-M., 1996, On Building an Interactive Alignment Tool, International Symposium on Combinatorics and Applications (SOCA '96), 84-93, Mainland China.

  12. Multiple Sequence Alignment by Kun-Mao Chao

  13. Homology Search Tools by Kun-Mao Chao

  14. ClustalW

  15. Huang, Y.-T., Zhang, K., Chen, T. and Chao, K.-M., 2005, “Selecting Additional Tag SNPs for Tolerating Missing Data in Genotyping,” BMC Bioinformatics, 6: 263.

  16. Huang, Y.-T., Chao, K.-M., and Chen, T., 2005, “An Approximation Algorithm for Haplotype Inference by Maximum Parsimony,” Journal of Computational Biology, 12: 1261-1274.

  17. Chang, C.-J., Huang, Y.-T., and Chao, K.-M., 2006, “A Greedier Approach for Finding Tag SNPs,” Bioinformatics, 22: 685-691.

  18. A Note on Computing Heaviest Segments by Kun-Mao Chao

  19. Chen, K.-Y. and Chao, K.-M.,  2007, “On the Range Maximum-Sum Segment Query Problem,” Discrete Applied Mathematics, 155: 2043-2052.

 

Homework assignments:

#1: Handout: Feb.22, 2011; Due: March 4, 2011

    Send TA (anchiang [not not at] gmail.com) a PowerPoint page including: (Subject: [HW1] Your student ID number)

   A sample solution

   More sample solutions: 2007  2008  2009

   2011 [3/8/2011]

 

Class presentations:

1.      The expected number of team members: ~ 8;

2.      Each member is required to present in turn;

3.      Revised slides should be sent to me one week after the presentation; (Please compress your figures.)

4.      Questions in class are always welcome.

 

Selected papers for presentation:

 

5/31/2011

[1] Paul G. Howard and Jeffrey Scott Vitter "Practical Implementations of Arithmetic Coding" Image and Text Compression, ed. by James A. Storer, Kluwer Academic Publication, 1992. More references: http://www.data-compression.info/Algorithms/AC/index.htm


蔡佩真  楊鈞傑  吳浩庠  李佳憲  李枝新  朱民晃  吳彥緯  姚甯之  黃信博

Slides

 

[2] Q. Wang, D. Korkin, and Y. Shang, “A Fast Multiple Longest Common Subsequence (MLCS) Algorithm,” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 23(3), 2011.


施羽芩  李鴻欣  劉士弘  周緯志  林耿生  江蘇峰  黃安婷  張世杰  潘彥謙

Slides

 

6/7/2011

[3] Eugene Myers and Webb Miller, "Optimal Alignments in Linear Space," CABIOS (Bioinformatics) 9: 169-176, 1988.


黃博平  陳彥璋  李政緯  涂宗瑋  王柏易  林澤豪  林萬泉

Slides

 

[4] Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Locating Well-Conserved Regions within a Pairwise Alignment, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 9: 387-396. (PostScript version)


洪晧瑜  賴宜萍  陳筱雯  魏昭寧  蔡長蓉  吳冠龍  何家全  林琬瑜

Slides

 

6/14/2011

[5] Jan Fostier, Sebastian Proost, Bart Dhoedt, Yvan Saeys, Piet Demeester, Yves Van de Peer, and Klaas Vandepoele, "A greedy, graph-based algorithm for the alignment of multiple homologous gene lists," Bioinformatics,  (2011) 27 (6): 749-756.


林建旻  葉士賢  黃柏翔  翁正勳  凌璟騰  麻立恆  高國書  駱家淮  陳振翔

 

[6] Quentin D. Atkinson, "Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from Africa," Science (15 April 2011): Vol. 332 no. 6027 pp. 346-349.

 

      黃曳弘  程士洵  廖堃吉  林怡萱

 
References (Recommended, but not required):
1. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, by Dan Gusfield (1997)
2. Biological Sequence Analysis, by Richard Durbin et al. (1998)
3. Computational Molecular Biology: An Algorithmic Approach, by Pavel Pevzner (2000)
4. An Introduction to Bioinformatics Algorithms, by Neil C. Jones and Pavel Pevzner (2004)
5. Chao, K.-M. and Zhang, L. (2009) Sequence Comparison: Theory and Methods,” Springer. (210 pages; ISBN 978-1848003194)
6. Related journal and conference papers
 
Some smart guys in this field (biological sequence analysis) you might wish to know:
Webb C. Miller
Michael S. Waterman
Russell F. Doolittle
Pavel Pevzner
Eugene W. Myers
David J. Lipman
Stephen F. Altschul
William R. Pearson
Samuel Karlin
Eric S. Lander
David Sankoff
Gary D. Stormo
Daniel M. Gusfield
Warren Gish
Martin Vingron
Vineet Bafna
Minoru Kanehisa (金久  )
Osamu Gotoh (後藤  修)
Ming Li (李明)
Tao Jiang (姜濤)
Xiaoqiu Huang (黃曉秋)
Louxin Zhang (張洛欣)
Ting Chen (陳梃)
... to be continued.
Kun-Mao Chao (趙坤茂; 互相稱讚求進步~~~)
 
Useful links:
  1. NCBI    
  2. NCBI Education
  3. Molecular Biology for Computer Scientists by Lawrence Hunter
  4.  Developing Bioinformatics Computer Skills by Cynthia Gibas & Per Jambeck
  5. Sense from Sequences: Stephen F. Altschul on Bettering BLAST , July/August 2000 (A story about BLAST)
  6. Initial sequencing and analysis of the human genome (15 February 2001 Nature 409, 860 - 921 (2001)
  7.  The Sequence of the Human Genome (Science 2001 February 16; 291: 1304-1351)
  8. An evaluation of the draft human genome sequence (Nature Genetics 29, 88 - 91 (01 Sep 2001) Letters)
  9.  Initial sequencing and comparative analysis of the mouse genome (Nature 420, 520 - 562 (2002))
  10. The UCSC Genome Browser
  11. The Chimpanzee Genome (Nature; Sept. 1, 2005)
  12. The Chimpanzee Sequencing and Analysis Consortium25, "Initial sequence of the chimpanzee genome and comparison with the human genome," Nature 437, 69-87 (1 September 2005) | doi: 10.1038/nature04072
  13. The HapMap Project (Nature; Oct. 27, 2005)
    Please visit:
    http://www.nature.com/nature/journal/v437/n7063/index.html
     
    and in particular the following research articles:
    http://www.nature.com/nature/journal/v437/n7063/index.html#Article
Bibliography links: