Course: Algorithms for Biological Sequence Analysis
Spring semester, 2003
Monday
3 credits
Web site: http://www.csie.ntu.edu.tw/~kmchao/seq2003
Instructor: Kun-Mao Chao (»¯©[Z)
Teaching assistant: Hwa-Sheng Chiu (ªô¾ìÁn) email: r91031@csie.ntu.edu.tw
Prerequisites: Some basic knowledge on algorithm development and program design is required. Background in bioinformatics and computational biology is welcome but not required for taking this course.
Coursework:
Programming assignments (25%)
Midterm exam (40%)
Final project (Oral presentation of selected papers) (30%)
Class participation (5%)
Our class PowerPoint slides:
Lecture 1: Never-ending stories (
Lectures 2-x: Dynamic-programming strategies for analyzing
biomolecular sequences (
My class notes:
Chao, K.-M., ¡§Dynamic-programming strategies for analyzing
biomolecular sequences.¡¨ (
Chao, K. -M., Hardison, R. C. and Miller, W. , 1993, ¡§Multiple Alignment in Linear Space,¡¨ Manuscript.
Lin, Y. ¡VL., Jiang, T. and Chao, K. ¡VM., 2002, ¡§Efficient
Algorithms for Locating the Length-Constrained Heaviest Segments, with
Applications to Biomolecular Sequence Analysis,¡¨ Journal of Computer and
System Sciences, 570-586.
(PDF file)
Chao, K. -M., Pearson, W. R. and Miller, W.
, 1992, Aligning Two
Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics),
8: 481-487.
Chao, K. -M., Hardison, R. C. and Miller, W. , 1993, Constrained
Sequence Alignment, Bulletin of Mathematical Biology, 55: 503-524.
Chao, K. -M., Hardison, R. C. and Miller,
W. , 1993, Locating
Well-Conserved Regions within a Pairwise Alignment, Computer Applications in the Biosciences (CABIOS, now Bioinformatics),
9: 387-396.
Chao, K. -M., Hardison R. C. and Miller, W.
, 1994, Recent
Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291.
Chao, K. -M., 1994, Computing All
Suboptimal Alignments in Linear Space, Combinatorial Pattern Matching '94, Lecture Notes in Computer Science
807, 31-42, California, USA.
Chao, K. -M., 1998, ¡§On Computing all
Suboptimal Alignments,¡¨ Information
Sciences, 105: 189-207.
Programming assignments:
ü Penalties for late submissions: # of days ¡Ñ 5%
ü Please email your source code + execution file to TA
ü
Subject (¥D¦®): sequence HW_(n)
(For example, ¡§sequence HW_(1)¡¨ for assignment #1.)
ü
Compress your files:
e.g., r91922031_ªô¾ìÁn_HW1.zip
ü Please specify your PL and platform in your email.
Assignment #1 (
(Correctness: 75%; Linear-time implementation: 15%; Comments: 10%)
Asignments #2 and #3 (March 17, 2003,
Due:
Midterm (
Selected papers for presentation:
1. S.F. Altschul, T. L. Madden, A.A. Sch{\"{a}}ffer, J. Zhang, Z. Zhang, W. Miller and D. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25 (1997), 3389--3402. (4/21/2003) ±i¨|ºa
2.
Miller, W.,
2001, ¡§Comparison
of genomic DNA sequences: solved and unsolved problems¡¨, Bioinformatics,
17: 391-397. (4/28/2003) ¸«í«C ´åÄË·¢
Zheng Zhang, Piotr Berman, Thomas Wiehe, and Webb Miller Post-processing
long pairwise alignments Bioinformatics
1999 15: 1012-1019. (4/28/2003) ³¯¤¨
3.
Chao, K. -M. and Miller, W. , 1995, Linear-Space
Algorithms that Build Local Alignments from Fragments, Algorithmica, 13:
106-134. (5/5/2003)
ªL®e¥ô ¾G©ø©y
4.
a. MUMmer 2.1, PROmer, and NUCmer are described in Abstract
or full text:
Fast
Algorithms for Large-scale Genome Alignment and Comparison. A.L.
Delcher, A. Phillippy, J. Carlton, and S.L. Salzberg. Nucleic Acids
Research (2002), Vol. 30, No. 11 2478-2483 (5/12/2003)
b. MUMmer 1.0 is described in:
Alignment of Whole
Genomes (148K, PDF format) by A.L. Delcher, S. Kasif, R.D.
Fleischmann, J. Peterson, O. White, and S.L. Salzberg. Nucleic Acids
Research, 27:11 (1999), 2369-2376.
³¯´©M §f¾Ç§
´¿§²E
5.
Wu, B. Y., Lancia, G., Bafna, V., Chao, K. -M., Ravi, R. and Tang,
C. Y. , 2000, A Polynomial-Time
Approximation Scheme for Minimum Routing Cost Spanning Trees, SIAM Journal on Computing, 29: 761-778. (5/19/2003 à 5/12/2003)
±i¥|ºû
Jakob Skou Pedersen and Jotun Hein Gene
finding with a hidden Markov model of genome structure and evolution Bioinformatics 2003 19: 219-227. (5/19/2003 à 6/2/2003) ³¯·z¤å ªL®ÑÂE
6.
U. Manber and E. Myers, ``Suffix Arrays: A
New Method for On-Line String Searches,''
¬x«T¦Ë
7. E. Myers, ``Whole-Genome DNA Sequencing,'' IEEE Computational Engineering and Science 3, 1 (1999), 33-43. (6/2/2003) ±i¤Ñ»¨ §õ«W¼ü
8. E. Myers and R. Durbin, ``A Table-Driven, Full-Sensitivity Similarity Search Algorithm," J. of Computational Biology, accepted. (Also appeared in: Proc. Workshop on Algorithms for BioInformatics (Rome, Italy 2002), 331-342. (6/9/2003) ·¨¤@°a ¶ÀÄ_¸©
References (Recommended, but not required):
1. Introduction to Computational Molecular Biology, by Joao Carlos Setubal and Joao Meidanis (1996)
2. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, by Dan Gusfield (1997)
3. Biological Sequence Analysis, by Richard Durbin et al. (1998)
4. Computational Molecular Biology: An Algorithmic Approach, by Pavel Pevzner (2000)
5. Related journal and conference papers
Useful links:
1. Molecular Biology for Computer Scientists by Lawrence Hunter
2. Developing Bioinformatics Computer Skills by Cynthia Gibas & Per Jambeck
3. The 2003 Database Issue of Nucleic Acids Research
4.
Initial
sequencing and analysis of the human genome
(15 February 2001 Nature 409, 860 - 921 (2001))
5.
The
Sequence of the Human Genome
(Science 2001
February 16; 291: 1304-1351)
6.
An
evaluation of the draft human genome sequence
(Nature Genetics29, 88 - 91 (01 Sep 2001)
Letters)
7.
Initial
sequencing and comparative analysis of the mouse genome
(Nature 420, 520
- 562 (2002))
8. Sense from Sequences: Stephen F. Altschul on Bettering BLAST July/August 2000 (A story about BLAST)
Bibliography links:
1.
PubMed
(PubMed, a
service of the National Library of Medicine, provides access to over 12 million
MEDLINE citations back to the mid-1960's and additional life science journals.
PubMed includes links to many sites providing full text articles and other
related resources.)
2.
CiteSeer
(ResearchIndex is a scientific literature digital
library that aims to improve the dissemination and feedback of scientific
literature, and to provide improvements in functionality, usability,
availability, cost, comprehensiveness, efficiency, and timeliness.)