Course: Algorithms for Biological Sequence Analysis
Fall semester, 2017
9:10 – 12:10 Tuesday, R107@CSIE Building
3 credits

Website: http://www.csie.ntu.edu.tw/~kmchao/seq17fall

Instructor: Kun-Mao Chao (趙坤茂)

Teaching assistant: Yi-Chen Huang (黃亦晨)  r06945025 followed by @ntu.edu.tw
[TA's office hours: by appointment; Venue: R432]

 

Prerequisites: Some basic knowledge on algorithms is required. Background in bioinformatics and computational biology is welcome but not required for taking this course.

 

Classmates: I  II  III  IV  V  VI

 

Coursework:

    Homework assignments and Class participation (10%)

    Two midterm exams (70%; 35% each):

        1. Midterm #1: Oct. 31, 2017

        2. Midterm #2: Dec. 12, 2017

    Oral presentation of selected papers/projects (20%)

 

Supporting materials:

    About This Class     Survey [2017/9/12]

    Interesting Sequences [2017/9/12]

    A Whirlwind Tour of Bioinformatics [2017/9/19]

    Basic Algorithmic Strategies [2017/9/19; 2017/9/26; 2017/10/3]

              -- Basic Algorithmic Strategies [2017/9/19; 2017/9/26; 2017/10/3]

              -- LIS example [2017/9/26]

              -- LCS_example [2017/10/3]

    Large-Scale Metagenomic Sequence Clustering and Inference of Environment-Microbe and Microbe-Microbe Associations [2017/9/26]

    Sequence Alignment [2017/10/3; 2017/10/17; 2017/10/24; up to page 49]

-- Global Alignment [2017/10/3; 2017/10/17]

-- Local Alignment [2017/10/3; 2017/10/17]

-- Various Scoring Schemes [2017/10/3; 2017/10/17; 2017/10/24]

-- An affine-gap-penalty example [2017/10/3; 2017/10/17; 2017/10/24]

-- Example 2; Solution [2017/10/3; 2017/10/17; 2017/10/24]

-- Scoring scheme examples [2017/10/3; 2017/10/17]

-- Needleman, Saul B.; and Wunsch, Christian D. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins". Journal of Molecular Biology 48 (3): 443–53.

-- Smith, Temple F.; and Waterman, Michael S. (1981). "Identification of Common Molecular Subsequences". Journal of Molecular Biology 147: 195–197.

-- Gotoh, Osamu: An improved algorithm for matching biological sequences. In: Journal of Molecular Biology. 162, 1982, S. 705-708 (PDF, 206 KB).

    Space-Saving Strategies [2017/10/24; up to page 23]

    Suboptimal Alignments [2017/11/07]

-- Space-Saving Strategies (pdf) [2017/10/24; 2017/11/07]

-- A Note for Method 7@Suboptimal Alignments [2017/12/2]

-- Eugene Myers and Webb Miller, "Optimal Alignments in Linear Space," CABIOS (Bioinformatics) 9: 169-176, 1988. [pdf] [2017/10/24]

-- Chao, K. -M., Pearson, W. R. and Miller, W., 1992, Aligning Two Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 8: 481-487.

-- Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Constrained Sequence Alignment, Bulletin of Mathematical Biology, 55: 503-524.

-- Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Locating Well-Conserved Regions within a Pairwise Alignment, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 9: 387-396.

-- Chao, K. -M., 1994, Computing All Suboptimal Alignments in Linear Space, Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807, 31-42, California, USA. [2017/11/07]

-- Chao, K. -M., Hardison R. C. and Miller, W., 1994, Recent Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291. [2017/10/24; 2017/11/07]

-- Chao, K.-M. and Miller, W., 1995, Linear-Space Algorithms that Build Local Alignments from Fragments, Algorithmica, 13: 106-134.

    Multiple Sequence Alignment [2017/11/14]

    -- Multiple Sequence Alignment [2017/11/14]

Homology Search [2017/11/14; 2017/11/21; pp. 1-12, 23-33, 49]

-- Homology Search Tools (pdf) [2017/11/14; 2017/11/21]

   Genome Reconstruction by Phillip E. C. Compeau and Pavel A. Pevzner [2017/11/28]

            pdf (the first one in the link), ppt (the third one in the link)
(For ppt, you may skip pages 125-175 & 301-340.)

    -- B(2,3) [2017/11/28]

    -- Eulerian Tours by Christos H. Papadimitriou & Umesh Vazirani [2017/]
(Exchange 10 and 01 of the left figure on Page 4)

    -- An example for fragment assembly as an Eulerian cycle problem [2017/11/28]

    Pattern Identification in a Haplotype Block (ppt, pdf) [2017/12/5; up to p. 22]

              -- An example [2017/12/5]

 

    Haplotype Inference (ppt, pdf) [2017/12/5]
    (Read the problem formulation on pp. 1262-1263 of the pdf file)

    -- Some other ways of formulation [2017/12/5]

    Heaviest Segments []

   -- A Note on Computing Heaviest Segments by Kun-Mao Chao []

    RMSQ []

-- Chen, K.-Y. and Chao, K.-M.,  2007, “On the Range Maximum-Sum Segment Query Problem,” Discrete Applied Mathematics, 155: 2043-2052. []

Homework assignments:

 

#1: Handout: September 12, 2017; Due: September 29, 2017

    Send TA a PowerPoint page including: (Subject: [HW1] Your student ID number)

   A sample solution

   More sample solutions: 2007  2008  2009  2011 2015 2016

   2017

 

Class presentations:

1.      The expected number of team members: <6;

2.      Each member is required to present in turn [about 150/(the number of speakers on the same day) minutes each];

3.      Revised slides should be sent to me within one week after the presentation. Please compress your figures.

4.      Questions in class are always welcome.

 

Dec. 19, 2017

Technology Feature | The singles seen: Sequencing gets specific

Alan Dove, Science  03 Nov 2017: Vol. 358, Issue 6363, pp. 677-679.

Karla Hörmann (胡可萌)

Slides

 

How are DNAs woven into chromosomes?

Kim Nasmyth, Science  03 Nov 2017: Vol. 358, Issue 6363, pp. 589-590.

林品君  林雅琦

Slides

 

Competing chromosomes explain junk DNA

Francis J. McNally, Science  03 Nov 2017: Vol. 358, Issue 6363, pp. 594-595.

張家瑜  王擎天  朱柏昇

Slides

Dec. 26, 2017

Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago

Carina M. Schlebusch et al., Science  03 Nov 2017: Vol. 358, Issue 6363, pp. 652-655.

陳柏佑  楊茂榮

Slides

 

Detecting evolutionary forces in language change

Mitchell G. Newberry et al., Nature, Published online: 01 November 2017.

張辰

Slides

 

Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates

Tobias Guldberg Frøslev et al., Nature Communications 8, 1188.

郭芛宏

 

Neptune: a bioinformatics tool for rapid discovery of genomic variation in bacterial populations.

Marinier E et al., Nucleic Acids Res. 2017 Oct 13;45(18):e159.

戴嘉星  董文捷

Slides

 

Graphtyper enables population-scale genotyping using pangenome graphs.

Eggertsson HP et al., Nat Genet. 2017 Nov;49(11):1654-1660.

陳至侃  陳鵬宇

Slides

Jan. 2, 2018

Improved algorithmic complexity for the 3SEQ recombination detection algorithm.

Lam HM, Ratmann O, Boni MF., Mol Biol Evol. 2017 Oct 3.

樂正  張耿健  張馭荃  葉光哲

Slides

 

Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing.

Orenstein Y et al., PLoS Comput Biol. 2017 Oct 2;13(10):e1005777.

黃柏文  陳泓宇

Slides

 

The SeqAn C++ template library for efficient sequence analysis: A resource for programmers.

Reinert K et al., J Biotechnol. 2017 Nov 10;261:157-168.

張高登  松井孔明

 

Selected papers for presentation:

(You may access these selected articles using computers with NTU IP addresses.)

1.          Technology Feature | The singles seen: Sequencing gets specific
Alan Dove, Science  03 Nov 2017: Vol. 358, Issue 6363, pp. 677-679.

2.          How are DNAs woven into chromosomes?
Kim Nasmyth, Science  03 Nov 2017: Vol. 358, Issue 6363, pp. 589-590.

3.          Competing chromosomes explain junk DNA
Francis J. McNally, Science  03 Nov 2017: Vol. 358, Issue 6363, pp. 594-595.

4.          Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago
Carina M. Schlebusch et al., Science  03 Nov 2017: Vol. 358, Issue 6363, pp. 652-655.

5.          Detecting evolutionary forces in language change
Mitchell G. Newberry et al., Nature, Published online: 01 November 2017.

6.          Dense and accurate whole-chromosome haplotyping of individual genomes
David Porubsky et al., Nature Communications 8, 1293.

7.          Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates
Tobias Guldberg Frøslev et al., Nature Communications 8, 1188.

8.          PLATO software provides analytic framework for investigating complexity beyond genome-wide association studies
Molly A. Hall et al., Nature Communications 8, 1167.

9.          Neptune: a bioinformatics tool for rapid discovery of genomic variation in bacterial populations.
Marinier E et al., Nucleic Acids Res. 2017 Oct 13;45(18):e159.

10.      Improved algorithmic complexity for the 3SEQ recombination detection algorithm.
Lam HM, Ratmann O, Boni MF., Mol Biol Evol. 2017 Oct 3.

11.      Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing.
Orenstein Y et al., PLoS Comput Biol. 2017 Oct 2;13(10):e1005777.

12.      Graphtyper enables population-scale genotyping using pangenome graphs.
Eggertsson HP et al., Nat Genet. 2017 Nov;49(11):1654-1660.

13.      The SeqAn C++ template library for efficient sequence analysis: A resource for programmers.
Reinert K et al., J Biotechnol. 2017 Nov 10;261:157-168.

14.      kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity.
Murray KD, et al., PLoS Comput Biol. 2017 Sep 5;13(9):e1005727.

 

References (Recommended, but not required):
1. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, by Dan Gusfield (1997)
2. Biological Sequence Analysis, by Richard Durbin et al. (1998)
3. Computational Molecular Biology: An Algorithmic Approach, by Pavel Pevzner (2000)
4. An Introduction to Bioinformatics Algorithms, by Neil C. Jones and Pavel Pevzner (2004)
5. Sequence Comparison: Theory and Methods,” by Chao, K.-M. and Zhang, L. (2009)
6. Bioinformatics for Biologists, edited by Pavel Pevzner and Ron Shamir (2011)
7. Bioinformatics Algorithms: an Active Learning Approach, Vol. I, II by Phillip E. C. Compeau and Pavel A. Pevzner (August 2015)

[http://rosalind.info/problems/list-view/?location=bioinformatics-textbook-track]

8. Related journal and conference papers
 
Bibliography links: