Course: Algorithms for Biological Sequence Analysis

Fall semester, 2015

9:10 - 12:10 Monday, R105@CSIE Building

3 credits

Web site: http://www.csie.ntu.edu.tw/~kmchao/seq15fall

Instructor: Kun-Mao Chao (趙坤茂)

Teaching assistant: Kevin Lin (林柏廷; )
[TA's office hours: 3:00-5:00 PM, Tuesdays; Venue: R432]

 

Prerequisites: Some basic knowledge on algorithms is required. Background in bioinformatics and computational biology is welcome but not required for taking this course.

 

Classmates: I  II  III  IV  V

 

Coursework:

    Homework assignments and Class participation (10%)

    Two midterm exams (70%; 35% each):

        1. Midterm #1: Nov. 2, 2015

        2. Midterm #2: Dec. 14, 2015

    Oral presentation of selected papers/projects (20%)

 

Supporting materials:

    About This Class     Survey [2015/9/14]

    Interesting Sequences [2015/9/14; hyperlink revised on 2015/9/20]

    A Whirlwind Tour of Bioinformatics [2015/9/21]

    Basic Algorithmic Strategies [2015/9/21; 2015/10/5]

              -- Basic Algorithmic Strategies [2015/9/21; 2015/10/5]

              -- LIS example [2015/10/5]

 

    Sequence Alignment [2015/10/5; 2015/10/12; 2015/10/19]

-- Global Alignment [2015/10/5; 2015/10/12]

-- Local Alignment [2015/10/12]

-- Various Scoring Schemes [2015/10/12]

-- An affine-gap-penalty example [2015/10/12]

-- Scoring scheme examples [2015/10/12]

-- Needleman, Saul B.; and Wunsch, Christian D. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins". Journal of Molecular Biology 48 (3): 443–53.

-- Smith, Temple F.; and Waterman, Michael S. (1981). "Identification of Common Molecular Subsequences". Journal of Molecular Biology 147: 195–197.

-- Gotoh, Osamu: An improved algorithm for matching biological sequences. In: Journal of Molecular Biology. 162, 1982, S. 705-708 (PDF, 206 KB).

    Space-Saving Strategies [2015/10/19]

    Suboptimal Alignments [2015/10/26]

-- Space-Saving Strategies (pdf) [2015/10/19; 2015/10/26]

-- Eugene Myers and Webb Miller, "Optimal Alignments in Linear Space," CABIOS (Bioinformatics) 9: 169-176, 1988. [pdf] [2015/10/19]

-- Chao, K. -M., Pearson, W. R. and Miller, W., 1992, Aligning Two Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 8: 481-487.

-- Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Constrained Sequence Alignment, Bulletin of Mathematical Biology, 55: 503-524.

-- Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Locating Well-Conserved Regions within a Pairwise Alignment, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 9: 387-396.

-- Chao, K. -M., 1994, Computing All Suboptimal Alignments in Linear Space, Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807, 31-42, California, USA. [2015/10/26]

-- Chao, K. -M., Hardison R. C. and Miller, W., 1994, Recent Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291. [2015/10/19; 2015/10/26]

-- Chao, K.-M. and Miller, W., 1995, Linear-Space Algorithms that Build Local Alignments from Fragments, Algorithmica, 13: 106-134.

    Multiple Sequence Alignment [2015/11/9]

    -- Multiple Sequence Alignment [2015/11/9]

Homology Search [2015/11/9; 2015/11/16]

-- Homology Search Tools (pdf) [2015/11/9; 2015/11/16]

   Genome Reconstruction by Phillip E. C. Compeau and Pavel A. Pevzner [2015/11/23]

            pdf (the first one in the link), ppt (the third one in the link)
(For ppt, you may skip pages 125-175 & 301-350.)

    -- B(2,3) [2015/11/23]

    -- Eulerian Tours by Christos H. Papadimitriou & Umesh Vazirani [2015/11/23]
(Exchange 10 and 01 of the left figure on Page 4)

    -- An example for fragment assembly as an Eulerian cycle problem [2015/11/23]

    Pattern Identification in a Haplotype Block (ppt, pdf) [2015/11/30]

              -- An example [2015/11/30]

 

    Haplotype Inference (ppt, pdf) [2015/11/30]
    (Read the problem formulation on pp. 1262-1263 of the pdf file)

    -- Some other ways of formulation [2015/11/30]

    Heaviest Segments [2015/12/07]

   -- A Note on Computing Heaviest Segments by Kun-Mao Chao [2015/12/07]

    RMSQ [2015/12/07]

-- Chen, K.-Y. and Chao, K.-M.,  2007, “On the Range Maximum-Sum Segment Query Problem,” Discrete Applied Mathematics, 155: 2043-2052. [2015/12/07]

Homework assignments:

 

#1: Handout: September 14, 2015; Due: September 30, 2015

    Send TA a PowerPoint page including: (Subject: [HW1] Your student ID number)

   A sample solution

   More sample solutions: 2007  2008  2009  2011

   2015 [2015/10/5]

 

Class presentations:

1.      The expected number of team members: <6;

2.      Each member is required to present in turn [about 150/(the number of speakers on the same day) minutes each];

3.      Revised slides should be sent to me within one week after the presentation. Please compress your figures.

4.      Questions in class are always welcome.

 

Selected papers for presentation:

(You may access these selected articles using computers with NTU IP addresses.)

 

Dec. 21, 2015

 

Richard Van Noorden, Brendan Maher, and Regina Nuzzo, "The top 100 papers--Nature explores the most-cited research of all time," Nature 514, 550–553, 2014.

姜莉玫  姜智偉  潘世璋  蔡智晴

 

BLAST

林子源  郭育辰  林展翔  林岳徵  楊鑫平  劉明宏

 

Dec. 28, 2015

 

Joshua G. Schraiber and Joshua M. Akey, Methods and models for unravelling human evolutionary history, Nature Reviews Genetics (Published online 10 November 2015).

姜立垣  林劭軒  羅玳琳  沈昊平

 

M. Gallego Llorente, et al., Ancient Ethiopian genome reveals extensive Eurasian admixture throughout the African continent, ScienceVol. 350 no. 6262 pp. 820-822.

陳琤  黃佑仁  宋欣烜  趙叡  謝于琳  楊雙丞

 

Jan. 4, 2016

 

Nicholas J. Loman and Mark J. Pallen, Twenty years of bacterial genome sequencing, Nature Reviews Microbiology (Published online 09 November 2015).

張文  林碩彥  丁名初  藍挺瑋  彭治學

 

John R. McCarrey, The epigenome—a family affair, Science 6 November 2015: Vol. 350 no. 6261 pp. 634-635.

王曉玲  林蔚城  陳建禎  莊智堯  鄭佳銘  沈泓宇

 

More:

Pearson, WR., Selecting the Right Similarity-Scoring Matrix., Curr Protoc Bioinformatics. 2013.

Wang, Q., Korkin, D., and Shang, Y., “A Fast Multiple Longest Common Subsequence (MLCS) Algorithm,” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 23(3), 2011.

Li, H. and Durbin, R., Fast and accurate short read alignment with Burrows-Wheeler transform
Bioinformatics. 2009 Jul 15;25(14):1754-60.

Gates W.H., and Papadimitriou, C.H. Bounds for sorting by prefix reversal. Discrete Math. 27 (1979), 47--57.

 

References (Recommended, but not required):
1. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, by Dan Gusfield (1997)
2. Biological Sequence Analysis, by Richard Durbin et al. (1998)
3. Computational Molecular Biology: An Algorithmic Approach, by Pavel Pevzner (2000)
4. An Introduction to Bioinformatics Algorithms, by Neil C. Jones and Pavel Pevzner (2004)
5. Sequence Comparison: Theory and Methods,” by Chao, K.-M. and Zhang, L. (2009)
6. Bioinformatics for Biologists, edited by Pavel Pevzner and Ron Shamir (2011)
7. Bioinformatics Algorithms: an Active Learning Approach, Vol. I, II by Phillip E. C. Compeau and Pavel A. Pevzner (August 2015)

[http://rosalind.info/problems/list-view/?location=bioinformatics-textbook-track]

8. Related journal and conference papers
 
Useful links:
  1. NCBI    
  2. NCBI Education
  3. Molecular Biology for Computer Scientists by Lawrence Hunter
  4.  Developing Bioinformatics Computer Skills by Cynthia Gibas & Per Jambeck
  5. Sense from Sequences: Stephen F. Altschul on Bettering BLAST , July/August 2000 (A story about BLAST)
  6. Initial sequencing and analysis of the human genome (15 February 2001 Nature 409, 860 - 921 (2001)
  7.  The Sequence of the Human Genome (Science 2001 February 16; 291: 1304-1351)
  8. An evaluation of the draft human genome sequence (Nature Genetics 29, 88 - 91 (01 Sep 2001) Letters)
  9.  Initial sequencing and comparative analysis of the mouse genome (Nature 420, 520 - 562 (2002))
  10. The UCSC Genome Browser
  11. The Chimpanzee Genome (Nature; Sept. 1, 2005)
  12. The Chimpanzee Sequencing and Analysis Consortium25, "Initial sequence of the chimpanzee genome and comparison with the human genome," Nature 437, 69-87 (1 September 2005) | doi: 10.1038/nature04072
  13. The HapMap Project (Nature; Oct. 27, 2005)
    Please visit:
    http://www.nature.com/nature/journal/v437/n7063/index.html
     
    and in particular the following research articles:
    http://www.nature.com/nature/journal/v437/n7063/index.html#Article
Bibliography links: