Course: Algorithms for Biological Sequence Analysis

Spring semester, 2003

Monday 2:10 ¡V 5:00 PM, 105 CSIE Building.

3 credits

Web site: http://www.csie.ntu.edu.tw/~kmchao/seq2003

Instructor: Kun-Mao Chao (»¯©[­Z)

Teaching assistant: Hwa-Sheng Chiu (ªô¾ìÁn)  email: r91031@csie.ntu.edu.tw

 

Prerequisites: Some basic knowledge on algorithm development and program design is required. Background in bioinformatics and computational biology is welcome but not required for taking this course.

 

Coursework:

Programming assignments (25%)

Midterm exam (40%)

Final project (Oral presentation of selected papers) (30%)

Class participation (5%)

 

Our classmates (I)

Our classmates (II)

Our classmates (III)

 

Our class PowerPoint slides:

Lecture 1: Never-ending stories (2/17/2003)

Lectures 2-x: Dynamic-programming strategies for analyzing biomolecular sequences (2/24/2003 - ?) (It will be revised later.)

 

My class notes:

Chao, K.-M., ¡§Dynamic-programming strategies for analyzing biomolecular sequences.¡¨ (Feb. 23, 2003) (To be included in "Selected Topics in Post-Genome Knowledge Discovery".)

Chao, K. -M., Hardison, R. C. and Miller, W. , 1993, ¡§Multiple Alignment in Linear Space,¡¨ Manuscript.

Lin, Y. ¡VL., Jiang, T. and Chao, K. ¡VM., 2002, ¡§Efficient Algorithms for Locating the Length-Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis,¡¨ Journal of Computer and System Sciences, 570-586.
(PDF file)

Chao, K. -M., Pearson, W. R. and Miller, W. , 1992, Aligning Two Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 8: 481-487.

Chao, K. -M., Hardison, R. C. and Miller, W. , 1993, Constrained Sequence Alignment, Bulletin of Mathematical Biology, 55: 503-524.

Chao, K. -M., Hardison, R. C. and Miller, W. , 1993, Locating Well-Conserved Regions within a Pairwise Alignment, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 9: 387-396.

Chao, K. -M., Hardison R. C. and Miller, W. , 1994, Recent Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291.

Chao, K. -M., 1994, Computing All Suboptimal Alignments in Linear Space, Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807, 31-42, California, USA.

Chao, K. -M., 1998, ¡§On Computing all Suboptimal Alignments,¡¨ Information Sciences, 105: 189-207.

 

 

Programming assignments:

ü          Penalties for late submissions: # of days ¡Ñ 5%

ü          Please email your source code + execution file to TA

ü          Subject (¥D¦®): sequence HW_(n)
(For example, ¡§sequence HW_(1)¡¨ for assignment #1.)

ü          Compress your files:
e.g., r91922031_
ªô¾ìÁn_HW1.zip

ü          Please specify your PL and platform in your email.

 

Assignment #1 (Feb. 24, 2003, Due: March 10, 2003)

(Correctness: 75%; Linear-time implementation: 15%; Comments: 10%)

A Big Test File  Answer

 

Asignments #2 and #3 (March 17, 2003, Due: March 31, 2003 and April 7, 2003)

 

Midterm (April 14, 2003)

 

Scoreboard

 

Selected papers for presentation:

1.          S.F. Altschul, T. L. Madden, A.A. Sch{\"{a}}ffer, J. Zhang, Z. Zhang, W. Miller and D. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25 (1997), 3389--3402. (4/21/2003) ±i¨|ºa

2.          Miller, W., 2001, ¡§Comparison of genomic DNA sequences: solved and unsolved problems¡¨, Bioinformatics, 17: 391-397. (4/28/2003) ¸­«í«C ´åÄË·¢
Zheng Zhang, Piotr Berman, Thomas Wiehe, and Webb Miller Post-processing long pairwise alignments Bioinformatics 1999 15: 1012-1019. (4/28/2003) ³¯¤¨

3.          Chao, K. -M. and Miller, W. , 1995, Linear-Space Algorithms that Build Local Alignments from Fragments, Algorithmica, 13: 106-134. (5/5/2003)
ªL®e¥ô ¾G©ø©y

4.          a. MUMmer 2.1, PROmer, and NUCmer are described in Abstract or full text:
Fast Algorithms for Large-scale Genome Alignment and Comparison.  A.L. Delcher, A. Phillippy, J. Carlton, and S.L. Salzberg.  Nucleic Acids Research (2002), Vol. 30, No. 11 2478-2483 (5/12/2003)
b. MUMmer 1.0 is described in:
Alignment of Whole Genomes  (148K, PDF format) by A.L. Delcher, S. Kasif, R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg.  Nucleic Acids Research, 27:11 (1999), 2369-2376.
³¯´­©M §f¾Ç§ ´¿­§²E

5.          Wu, B. Y., Lancia, G., Bafna, V., Chao, K. -M., Ravi, R. and Tang, C. Y. , 2000, A Polynomial-Time Approximation Scheme for Minimum Routing Cost Spanning Trees, SIAM Journal on Computing, 29: 761-778. (5/19/2003 à 5/12/2003)
±i¥|ºû
Jakob Skou Pedersen and Jotun Hein Gene finding with a hidden Markov model of genome structure and evolution Bioinformatics 2003 19: 219-227. (5/19/2003 à 6/2/2003) ³¯·z¤å ªL®ÑÂE

6.          U. Manber and E. Myers, ``Suffix Arrays: A New Method for On-Line String Searches,'' SIAM Journal on Computing 22, 5 (1993), 935-948. (5/26/2003)
¬x«T¦Ë

7.          E. Myers, ``Whole-Genome DNA Sequencing,'' IEEE Computational Engineering and Science 3, 1 (1999), 33-43. (6/2/2003) ±i¤Ñ»¨ §õ«W¼ü

8.          E. Myers and R. Durbin, ``A Table-Driven, Full-Sensitivity Similarity Search Algorithm," J. of Computational Biology, accepted. (Also appeared in: Proc. Workshop on Algorithms for BioInformatics (Rome, Italy 2002), 331-342. (6/9/2003) ·¨¤@°a ¶ÀÄ_¸©

 

References (Recommended, but not required):

1.          Introduction to Computational Molecular Biology, by Joao Carlos Setubal and Joao Meidanis (1996)

2.          Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, by Dan Gusfield (1997)

3.          Biological Sequence Analysis, by Richard Durbin et al. (1998)

4.          Computational Molecular Biology: An Algorithmic Approach, by Pavel Pevzner (2000)

5.          Related journal and conference papers

 

Useful links:

1.          Molecular Biology for Computer Scientists by Lawrence Hunter

2.          Developing Bioinformatics Computer Skills by Cynthia Gibas & Per Jambeck

3.          The 2003 Database Issue of Nucleic Acids Research

4.          Initial sequencing and analysis of the human genome
(
15 February 2001 Nature 409, 860 - 921 (2001))

5.          The Sequence of the Human Genome
(
Science 2001 February 16; 291: 1304-1351)

6.          An evaluation of the draft human genome sequence
(
Nature Genetics29, 88 - 91 (01 Sep 2001) Letters)

7.          Initial sequencing and comparative analysis of the mouse genome
(Nature 420, 520 - 562 (2002))

8.          Sense from Sequences: Stephen F. Altschul on Bettering BLAST  July/August 2000 (A story about BLAST)

 

Bibliography links:

1.          PubMed
(
PubMed, a service of the National Library of Medicine, provides access to over 12 million MEDLINE citations back to the mid-1960's and additional life science journals. PubMed includes links to many sites providing full text articles and other related resources.)

2.          CiteSeer
(
ResearchIndex is a scientific literature digital library that aims to improve the dissemination and feedback of scientific literature, and to provide improvements in functionality, usability, availability, cost, comprehensiveness, efficiency, and timeliness.)