Course: Algorithms for Biological Sequence Analysis
Spring semester, 2004
Wednesday
3 credits
Web site: http://www.csie.ntu.edu.tw/~kmchao/seq04spr
Instructor: Kun-Mao Chao (»¯©[Z)
Teaching assistant: Rung-Ren Lin (ªL®e¥ô) email: r91054@csie.ntu.edu.tw
Prerequisites: Some basic knowledge on algorithm development and program design is required. Background in bioinformatics and computational biology is welcome but not required for taking this course.
Coursework:
Programming assignments (25%)
Midterm exam (40%)
Final project (Oral presentation of selected papers) (30%)
Class participation (5%)
Our classmates ¡K
Classmates in TIGP (Bioinformatics)
Class PowerPoint slides:
Dynamic Programming Strategies
Assigned reading materials:
Lin, Y. ¡VL., Jiang, T. and Chao, K. ¡VM., 2002, ¡§Efficient
Algorithms for Locating the Length-Constrained Heaviest Segments, with
Applications to Biomolecular Sequence Analysis,¡¨ Journal of Computer and
System Sciences, 570-586.
(PDF file)
Fast
Algorithms for Finding Maximum-Density Segments of a Sequence with Applications
to Bioinformatics, M. H. Goldwasser,
M.-Y. Kao, and Hsueh-I Lu in Proceedings of the 2nd Workshop on Algorithms in
Bioinformatics (WABI 2002), Rome,
Italy, September 17-21, 2002, LNCS
2452, pp. 157-171. (A revised version is here.)
Chao, K. -M., Pearson, W. R. and Miller, W.
, 1992, Aligning Two
Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics),
8: 481-487.
E. Myers and W. Miller, ``Optimal Alignments in Linear Space,'' CABIOS 4, 1 (1988), 11-17.
Chao, K. -M., Hardison, R. C. and Miller,
W. , 1993, Locating
Well-Conserved Regions within a Pairwise Alignment, Computer Applications in the Biosciences (CABIOS, now Bioinformatics),
9: 387-396.
Chao, K. -M., Hardison R. C. and Miller, W.
, 1994, Recent
Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291.
Chao, K. -M., 1994, Computing All
Suboptimal Alignments in Linear Space, Combinatorial Pattern Matching '94, Lecture Notes in Computer Science
807, 31-42, California, USA.
Class notes:
Chao, K.-M., ¡§Dynamic-programming strategies for analyzing biomolecular sequences.¡¨ (To be included in the book of "Selected Topics in Post-Genome Knowledge Discovery" (Chapter 1), 2004.)
Chao, K. -M., Hardison, R. C. and Miller, W. , 1993, ¡§Multiple Alignment in Linear Space,¡¨ Manuscript.
Chao, K. -M., Hardison, R. C. and Miller, W. , 1993, Constrained
Sequence Alignment, Bulletin of Mathematical Biology, 55: 503-524.
Chao, K. -M., 1998, ¡§On Computing all Suboptimal Alignments,¡¨ Information Sciences, 105: 189-207.
S. Wu, E. Myers, U. Manber, and W. Miller, ``An O(NP) Sequence Comparison Algorithm,'' Information Processing Letters 35, 6 (1990), 317-323.
Chao, K. -M. and Miller, W. , 1995, Linear-Space Algorithms that Build Local Alignments from Fragments, Algorithmica, 13: 106-134.
Programming assignments: (Given
ü Penalties for late submissions: # of days ¡Ñ 5%
ü Please email your source code + execution file to TA
ü
Subject (¥D¦®): sequence HW_(n)
(For example, ¡§sequence HW_(1)¡¨ for assignment #1.)
ü Please specify your PL and platform in your email.
ü Your team members
A programming assignment on some space-saving strategies for band alignment. (Each team <= 4 persons)
Please refer to http://www.csie.ntu.edu.tw/~kmchao/homework/seq04spr/ for band aligning programs and data.
1. g_band_NW.c: O(NW)-space band-aligning program
2. g_band_W.c: O(W)-space band-aligning program (Dividing by the middle row)
3. g_band_cW.c: cW-space band-aligning program (c is a parameter given in the command.)
Optimize your object code and record its execution time (without counting the time for displaying the alignment). Compare g_band, g_band_NW, g_band_W, and g_band_cW. (Try different sequence lengths and diagonal band widths.)
Midterm Exam Problems (
Class presentations:
1. Maximum number of team members: 2
2. Each member is required to present in turn;
3. Revised slides should be sent to me one week after the presentation;
4. Every student is required to submit a brief note right after the presentation;
5. Questions in class are always welcome;
Selected papers for presentation:
1.
S.F. Altschul, T. L. Madden,
A.A. Sch{\"{a}}ffer, J. Zhang, Z. Zhang, W. Miller and D. Lipman, Gapped BLAST and
PSI-BLAST: a new generation of protein database search programs. Nucleic
Acids Research 25 (1997), 3389--3402.
(4/21/2004)
²ø³Íµ¾ ±i¯Õ»¨
Slides
2.
Zheng Zhang, Piotr Berman, Thomas Wiehe, and Webb Miller Post-processing
long pairwise alignments Bioinformatics
1999 15: 1012-1019.
(4/28/2004)
³¯±Ò·×
Slides
3.
a. MUMmer 2.1, PROmer, and NUCmer are described in Abstract
or full text:
Fast
Algorithms for Large-scale Genome Alignment and Comparison. A.L.
Delcher, A. Phillippy, J. Carlton, and S.L. Salzberg. Nucleic Acids
Research (2002), Vol. 30, No. 11 2478-2483
b. MUMmer 1.0 is described in:
Alignment of Whole
Genomes (148K, PDF format) by A.L. Delcher, S. Kasif, R.D.
Fleischmann, J. Peterson, O. White, and S.L. Salzberg. Nucleic Acids
Research, 27:11 (1999), 2369-2376.
(5/5/2004)
³¯°ö¾±
Slides
4.
Gates W.H., and Papadimitriou,
C.H. Bounds for sorting by prefix reversal. Discrete Math. 27 (1979), 47--57.
³\³Í¥ ±i³Ç¥Í
Slides_1
Slides_2
Bafna, V., and Pevzner, P. A. "Genome
Rearrangements and Sorting by Reversals". In IEEE FOCS 1994 ,
148-157.
(5/12/2004)
·¨¨Îº·
(*
5.
U. Manber and E. Myers, ``Suffix Arrays: A
New Method for On-Line String Searches,''
J. Kärkkäinen, P. Sanders, ¡§Simple Linear Work Suffix Array Construction,¡¨ Proc.
13th International Conference on Automata, Languages and Programming
(ICALP, 2003).
(5/26/2004)
³¯¬f¦~ ±i¥Ã¹F
Slides
6.
E. Myers and R. Durbin, ``A Table-Driven,
Full-Sensitivity Similarity Search Algorithm," J. of Computational
Biology, accepted. (Also appeared in: Proc. Workshop on Algorithms for
BioInformatics (
(6/2/2004)
¤ý¨Î«n ¶Àà±®p
Slides
7.
V. Bafna and N. Edwards.
"On De novo interpretation of peptide mass spectra". In RECOMB
2003, 9-18. (pdf)
(6/9/2004)
¶ÀÄ£§Ê ¤ý¥°Û
Slides
References (Recommended, but not required):
1. Introduction to Computational Molecular Biology, by Joao Carlos Setubal and Joao Meidanis (1996)
2. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, by Dan Gusfield (1997)
3. Biological Sequence Analysis, by Richard Durbin et al. (1998)
4. Computational Molecular Biology: An Algorithmic Approach, by Pavel Pevzner (2000)
5. Related journal and conference papers
Some smart guys in this field (biological sequence analysis) you might wish to know:
Minoru Kanehisa (ª÷¤[ ¹ê)
Ming Li (§õ©ú)
Tao Jiang («¸ÀÜ)
Xiaoqiu Huang (¶À¾å¬î)
¡K to be continued.
Kun-Mao Chao (»¯©[Z; ¤£¦n·N«ä ¤¬¬ÛºÙÆg¨D¶i¨B~~~)
Useful links:
1.
Molecular Biology for Computer
Scientists by
2. Developing Bioinformatics Computer Skills by Cynthia Gibas & Per Jambeck
3. The 2004 Database Issue of Nucleic Acids Research
4. The 2003 Web Server Issue of Nucleic Acids Research
5.
Initial
sequencing and analysis of the human genome
(
6.
The
Sequence of the Human Genome
(Science 2001
February 16; 291: 1304-1351)
7.
An
evaluation of the draft human genome sequence
(Nature Genetics 29, 88 - 91 (
8.
Initial
sequencing and comparative analysis of the mouse genome
(Nature 420, 520
- 562 (2002))
9. Sense from Sequences: Stephen F. Altschul on Bettering BLAST , July/August 2000 (A story about BLAST)
Bibliography links:
1.
PubMed
(PubMed, a
service of the National Library of Medicine, provides access to over 12 million
MEDLINE citations back to the mid-1960's and additional life science journals.
PubMed includes links to many sites providing full text articles and other
related resources.)
2.
CiteSeer
(ResearchIndex is a scientific literature digital
library that aims to improve the dissemination and feedback of scientific
literature, and to provide improvements in functionality, usability,
availability, cost, comprehensiveness, efficiency, and timeliness.)