Course: Algorithms for Biological Sequence Analysis

Spring semester, 2004

Wednesday 10:20 ¡V 13:00, 309 CSIE Building.

3 credits

Web site: http://www.csie.ntu.edu.tw/~kmchao/seq04spr

Instructor: Kun-Mao Chao (»¯©[­Z)

Teaching assistant: Rung-Ren Lin (ªL®e¥ô)  email: r91054@csie.ntu.edu.tw

 

Prerequisites: Some basic knowledge on algorithm development and program design is required. Background in bioinformatics and computational biology is welcome but not required for taking this course.

 

Preliminary course outlines

 

Coursework:

Programming assignments (25%)

Midterm exam (40%)

Final project (Oral presentation of selected papers) (30%)

Class participation (5%)

 

Our classmates ¡K

        Classmates (I)

        Classmates (II)

        Classmates (III)

        Classmates (IV)

        Classmates (V)

        Classmates (VI)

        Classmates in TIGP (Bioinformatics)

 

Class PowerPoint slides:

        Dynamic Programming Strategies

        Sequence Alignment

        Space-Saving Strategies

        Deltapoints.ppt

        Hidden Markov Models

        Band.jpg

        Deltapoints.jpg

 

 

Assigned reading materials:

Lin, Y. ¡VL., Jiang, T. and Chao, K. ¡VM., 2002, ¡§Efficient Algorithms for Locating the Length-Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis,¡¨ Journal of Computer and System Sciences, 570-586.
(PDF file)

Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics, M. H. Goldwasser, M.-Y. Kao, and Hsueh-I Lu in Proceedings of the 2nd Workshop on Algorithms in Bioinformatics (WABI 2002), Rome, Italy, September 17-21, 2002, LNCS 2452, pp. 157-171. (A revised version is here.)

Chao, K. -M., Pearson, W. R. and Miller, W. , 1992, Aligning Two Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 8: 481-487.

E. Myers and W. Miller, ``Optimal Alignments in Linear Space,'' CABIOS 4, 1 (1988), 11-17.

Chao, K. -M., Hardison, R. C. and Miller, W. , 1993, Locating Well-Conserved Regions within a Pairwise Alignment, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 9: 387-396.

Chao, K. -M., Hardison R. C. and Miller, W. , 1994, Recent Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291.

Chao, K. -M., 1994, Computing All Suboptimal Alignments in Linear Space, Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807, 31-42, California, USA.

 

Class notes:

Chao, K.-M., ¡§Dynamic-programming strategies for analyzing biomolecular sequences.¡¨ (To be included in the book of "Selected Topics in Post-Genome Knowledge Discovery" (Chapter 1), 2004.)

Chao, K. -M., Hardison, R. C. and Miller, W. , 1993, ¡§Multiple Alignment in Linear Space,¡¨ Manuscript.

Chao, K. -M., Hardison, R. C. and Miller, W. , 1993, Constrained Sequence Alignment, Bulletin of Mathematical Biology, 55: 503-524.

Chao, K. -M., 1998, ¡§On Computing all Suboptimal Alignments,¡¨ Information Sciences, 105: 189-207.

S. Wu, E. Myers, U. Manber, and W. Miller, ``An O(NP) Sequence Comparison Algorithm,'' Information Processing Letters 35, 6 (1990), 317-323.

Chao, K. -M. and Miller, W. , 1995, Linear-Space Algorithms that Build Local Alignments from Fragments, Algorithmica, 13: 106-134.

 

Programming assignments: (Given 3/17/2004; Due 4/7/2004)

ü          Penalties for late submissions: # of days ¡Ñ 5%

ü          Please email your source code + execution file to TA

ü          Subject (¥D¦®): sequence HW_(n)
(For example, ¡§sequence HW_(1)¡¨ for assignment #1.)

ü          Please specify your PL and platform in your email.

ü          Your team members

 

A programming assignment on some space-saving strategies for band alignment. (Each team <= 4 persons)

 

Please refer to http://www.csie.ntu.edu.tw/~kmchao/homework/seq04spr/ for band aligning programs and data.

 

1.      g_band_NW.c: O(NW)-space band-aligning program

2.      g_band_W.c: O(W)-space band-aligning program (Dividing by the middle row)

3.      g_band_cW.c: cW-space band-aligning program (c is a parameter given in the command.)

 

Optimize your object code and record its execution time (without counting the time for displaying the alignment). Compare g_band, g_band_NW, g_band_W, and g_band_cW. (Try different sequence lengths and diagonal band widths.)

 

Midterm Exam Problems (April 14, 2004)

 

Class presentations:

1.      Maximum number of team members: 2

2.      Each member is required to present in turn;

3.      Revised slides should be sent to me one week after the presentation;

4.      Every student is required to submit a brief note right after the presentation;

5.      Questions in class are always welcome;

 

Selected papers for presentation:

1.          S.F. Altschul, T. L. Madden, A.A. Sch{\"{a}}ffer, J. Zhang, Z. Zhang, W. Miller and D. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25 (1997), 3389--3402.
(4/21/2004)
²ø³Íµ¾ ±i¯Õ»¨
Slides

2.          Zheng Zhang, Piotr Berman, Thomas Wiehe, and Webb Miller Post-processing long pairwise alignments Bioinformatics 1999 15: 1012-1019.
(4/28/2004)
³¯±Ò·×
Slides


3.          a. MUMmer 2.1, PROmer, and NUCmer are described in Abstract or full text:
Fast Algorithms for Large-scale Genome Alignment and Comparison.  A.L. Delcher, A. Phillippy, J. Carlton, and S.L. Salzberg.  Nucleic Acids Research (2002), Vol. 30, No. 11 2478-2483
b. MUMmer 1.0 is described in:
Alignment of Whole Genomes  (148K, PDF format) by A.L. Delcher, S. Kasif, R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg.  Nucleic Acids Research, 27:11 (1999), 2369-2376.
(5/5/2004)
³¯°ö¾±
Slides

4.          Gates W.H., and Papadimitriou, C.H. Bounds for sorting by prefix reversal. Discrete Math. 27 (1979), 47--57.
³\³Í¥­ ±i³Ç¥Í
Slides_1
Slides_2


Bafna, V., and Pevzner, P. A. "Genome Rearrangements and Sorting by Reversals". In IEEE FOCS 1994 , 148-157.
(5/12/2004)
·¨¨Îº·

(*5/19/2004, no class)

5.          U. Manber and E. Myers, ``Suffix Arrays: A New Method for On-Line String Searches,'' SIAM Journal on Computing 22, 5 (1993), 935-948.
J. Kärkkäinen, P. Sanders, ¡§Simple Linear Work Suffix Array Construction,¡¨ Proc. 13th International Conference on Automata, Languages and Programming (ICALP, 2003).
(5/26/2004)
³¯¬f¦~ ±i¥Ã¹F
Slides

6.          E. Myers and R. Durbin, ``A Table-Driven, Full-Sensitivity Similarity Search Algorithm," J. of Computational Biology, accepted. (Also appeared in: Proc. Workshop on Algorithms for BioInformatics (Rome, Italy 2002), 331-342.
(6/2/2004)
¤ý¨Î«n ¶Àà±®p
Slides


7.          V. Bafna and N. Edwards. "On De novo interpretation of peptide mass spectra". In RECOMB 2003, 9-18. (pdf)
(6/9/2004)
¶ÀÄ£§Ê ¤ý¥°­Û
Slides

 

References (Recommended, but not required):

1.          Introduction to Computational Molecular Biology, by Joao Carlos Setubal and Joao Meidanis (1996)

2.          Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, by Dan Gusfield (1997)

3.          Biological Sequence Analysis, by Richard Durbin et al. (1998)

4.          Computational Molecular Biology: An Algorithmic Approach, by Pavel Pevzner (2000)

5.          Related journal and conference papers

 

Some smart guys in this field (biological sequence analysis) you might wish to know:

Webb C. Miller

Michael S. Waterman

Russell F. Doolittle

Pavel Pevzner

Eugene W. Myers

David J. Lipman

Stephen F. Altschul

William R. Pearson

Samuel Karlin

Eric S. Lander

David Sankoff

Gary D. Stormo

Daniel M. Gusfield

Warren Gish

Vineet Bafna

Minoru Kanehisa (ª÷¤[  ¹ê)

Ming Li (§õ©ú)

Tao Jiang («¸ÀÜ)

Xiaoqiu Huang (¶À¾å¬î)

¡K to be continued.

Kun-Mao Chao (»¯©[­Z; ¤£¦n·N«ä ¤¬¬ÛºÙÆg¨D¶i¨B~~~)

 

Useful links:

1.          Molecular Biology for Computer Scientists by Lawrence Hunter

2.          Developing Bioinformatics Computer Skills by Cynthia Gibas & Per Jambeck

3.          The 2004 Database Issue of Nucleic Acids Research

4.          The 2003 Web Server Issue of Nucleic Acids Research

5.          Initial sequencing and analysis of the human genome
(
15 February 2001 Nature 409, 860 - 921 (2001))

6.          The Sequence of the Human Genome
(
Science 2001 February 16; 291: 1304-1351)

7.          An evaluation of the draft human genome sequence
(
Nature Genetics 29, 88 - 91 (01 Sep 2001) Letters)

8.          Initial sequencing and comparative analysis of the mouse genome
(Nature 420, 520 - 562 (2002))

9.          Sense from Sequences: Stephen F. Altschul on Bettering BLAST , July/August 2000 (A story about BLAST)

 

Bibliography links:

1.          PubMed
(
PubMed, a service of the National Library of Medicine, provides access to over 12 million MEDLINE citations back to the mid-1960's and additional life science journals. PubMed includes links to many sites providing full text articles and other related resources.)

2.          CiteSeer
(
ResearchIndex is a scientific literature digital library that aims to improve the dissemination and feedback of scientific literature, and to provide improvements in functionality, usability, availability, cost, comprehensiveness, efficiency, and timeliness.)