Course: Algorithms for Biological Sequence Analysis
Fall semester, 2020
9:10 – 12:10 Tuesday, R107@CSIE Building
3 credits

Website: http://www.csie.ntu.edu.tw/~kmchao/seq20fall

Instructor: Kun-Mao Chao (趙坤茂)

Teaching assistant:  百榆 r07922177 followed by @ntu.edu.tw
[TA's office hours: by appointment; Venue: R432]

 

Prerequisites: Some basic knowledge on algorithms is required. Background in bioinformatics and computational biology is welcome but not required for taking this course.

 

Classmates: I  II  III  IV  V

 

Coursework:

    Homework assignments and Class participation (10%)

        #1: Handout: September 15, 2020; Due: October 2, 2020 (See below)

    Two midterm exams (70%; 35% each):

        Midterm #1: Oct. 27, 2020
       
Midterm #2: Dec. 8, 2020

    Oral presentation of selected papers/projects (20%)

        [Dec. 15 for preparation],
        Dec. 22, Dec. 29, Jan. 5.

 

Supporting materials:

    About This Class     Survey [2020/9/15]

    Interesting Sequences [2020/9/15]

    A Whirlwind Tour of Bioinformatics [2020/9/15]

    Basic Algorithmic Strategies [2020/9/22]
 

              -- Basic Algorithmic Strategies [2020/9/22]

              -- Vertex_Cover_Greedy_Fail [2020/9/22]

              -- LIS example [2020/9/22]

              -- LCS_example [2020/9/22]

 

    Sequence Alignment [2020/9/22, 2020/9/29, 2020/10/6]

-- Global Alignment [2020/9/22, 2020/9/29]

-- A Note on Computing Heaviest Segments [2020/9/29]

-- Local Alignment [2020/9/29]

-- Various Scoring Schemes [2020/9/29, 2020/10/6]

-- An affine-gap-penalty example [2020/9/29]

-- Example 2; Solution [2020/9/29]

-- Scoring scheme examples [2020/9/29]

-- Needleman, Saul B.; and Wunsch, Christian D. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins". Journal of Molecular Biology 48 (3): 443–53.

-- Smith, Temple F.; and Waterman, Michael S. (1981). "Identification of Common Molecular Subsequences". Journal of Molecular Biology 147: 195–197.

-- Gotoh, Osamu: An improved algorithm for matching biological sequences. In: Journal of Molecular Biology. 162, 1982, S. 705-708 (PDF, 206 KB).

    Space-Saving Strategies [2020/10/6, 2020/10/13]

    Suboptimal Alignments [2020/10/13]

-- Space-Saving Strategies  [2020/10/6, 2020/10/13]

-- A Note for Method 7@Suboptimal Alignments [2020/10/13]

-- More on Method 7@Suboptimal Alignments [2020/11/3]

-- Eugene Myers and Webb Miller, "Optimal Alignments in Linear Space," CABIOS (Bioinformatics) 9: 169-176, 1988. [pdf]

-- Chao, K. -M., Pearson, W. R. and Miller, W., 1992, Aligning Two Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 8: 481-487.

-- Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Constrained Sequence Alignment, Bulletin of Mathematical Biology, 55: 503-524.

-- Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Locating Well-Conserved Regions within a Pairwise Alignment, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 9: 387-396.

-- Chao, K. -M., 1994, Computing All Suboptimal Alignments in Linear Space, Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807, 31-42, California, USA.

-- Chao, K. -M., Hardison R. C. and Miller, W., 1994, Recent Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291.

-- Chao, K.-M. and Miller, W., 1995, Linear-Space Algorithms that Build Local Alignments from Fragments, Algorithmica, 13: 106-134.

    Multiple Sequence Alignment [2020/10/20]

    -- Multiple Sequence Alignment [2020/10/20]

Homology Search [2020/11/3, 2020/11/10]

-- Homology Search Tools (pdf) [2020/11/3, 2020/11/10]

   Burrows-Wheeler Transform and FM-Index [2020/11/10]

 

              -- BWT_TENETENET [2020/12/22]

 

   Genome Reconstruction by Phillip E. C. Compeau and Pavel A. Pevzner [2020/11/17]

    -- ppt (You may skip pages 125-175 & 301-340.) [2020/11/17]

    -- Lectures by Phillip E. C. Compeau

    -- B(2,3)  B(2,4)  [2020/11/17]

    -- B(3,1)&B(3,2) [2020/12/22]

    -- Eulerian Tours by Christos H. Papadimitriou & Umesh Vazirani
(Exchange 10 and 01 of the left figure on Page 4)

    -- An example for fragment assembly as an Eulerian cycle problem  [2020/11/17]

    Pattern Identification in a Haplotype Block (ppt, pdf)  [2020/11/24]
 

              -- An example  [2020/11/24]

 

    Haplotype Inference (ppt, pdf)  [2020/11/24]
    (Read the problem formulation on pp. 1262-1263 of the pdf file)

    -- Some other ways of formulation  [2020/11/24]

    The 3SUM Problem  [2020/12/1]

 

    Heaviest Segments  [2020/12/1]

   -- A Note on Computing Heaviest Segments  [2020/12/1]

    RMSQ  [2020/12/1]

   -- Chen, K.-Y. and Chao, K.-M.,  2007, “On the Range Maximum-Sum Segment Query Problem,” Discrete Applied Mathematics, 155: 2043-2052.  [2020/12/1]

Homework assignments:

 

#1: Handout: September 15, 2020; Due: October 2, 2020

    Send TA a PowerPoint page including: (Subject: [HW1] Your student ID number)

   A sample solution

   More sample solutions: 2007  2008  2009  2011  2015  2016  2017  2018  2019

 

Class presentations:

1.      The expected number of team members: 1~6;

2.      Each member is required to present in turn [about 10-15 minutes each];

3.      Revised slides should be sent to me within one week after the presentation.

4.      Questions in class are always welcome.

 

[Dec. 15 for preparation, no class]

 

Dec. 22

Mahdi Boroujeni, Masoud Seddighin, Saeed Seddighin:
Improved Algorithms for Edit Distance and LCS: Beyond Worst Case. SODA 2020: 1601-1620.

朱彥如  賴開元  周良冠

 

An Efficient Algorithm for Finding Special Palindromes - Yin-Chu Chen's Master Thesis (2019)

穆大有

 

Timo Lassmann:
Kalign 3: multiple sequence alignment of large datasets. Bioinform. 36(6): 1928-1929 (2020)

鄭士驤  馮才昇  陳傳諭  林祥瑞

 

Dec. 29

 

Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E
A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.
Science. 2012 Aug 17;337(6096):816-21.
(The CRISPR-Cas9 genome editing technique was a significant contributor to the Nobel Prize in Chemistry in 2020 being awarded to Emmanuelle Charpentier and Jennifer Doudna.)

Doudna JA.
The promise and challenge of therapeutic genome editing.
Nature. 2020 Feb;578(7794):229-236.

張晏瑋  陳炫均  柯叡澤

 

Abbott TR, Dhamdhere G, Liu Y, Lin X, Goudy L, Zeng L, Chemparathy A, Chmura S, Heaton NS, Debs R, Pande T, Endy D, La Russa MF, Lewis DB, Qi LS.
Development of CRISPR as an Antiviral Strategy to Combat SARS-CoV-2 and Influenza.
Cell. 2020 May 14;181(4):865-876.e12.

王又可  蘇郁文

 

Severe Covid-19 GWAS Group, Ellinghaus D, Degenhardt F, ..., Karlsen TH.
Genomewide Association Study of Severe Covid-19 with Respiratory Failure.
N Engl J Med. 2020 Oct 15;383(16):1522-1534.

黃冠瑋  闕中一  趙冠豪

 

Jan. 5

Pesho Ivanov, Benjamin Bichsel, Harun Mustafa, André Kahles, Gunnar Rätsch, Martin T. Vechev:
AStarix: Fast and Optimal Sequence-to-Graph Alignment. RECOMB 2020: 104-119

王彥仁  林庭風  林喬文  簡仲明

 

Tavor Z. Baharav, Govinda M. Kamath, David N. C. Tse, Ilan Shomorony:
Spectral Jaccard Similarity: A New Approach to Estimating Pairwise Sequence Alignments. RECOMB 2020: 223-225
(Also, Pattern Volume 1, Issue 6, 11 September 2020, 100081)

李智揚  張高登

 

 

Selected papers for presentation:

(You may access these selected articles using computers with NTU IP addresses.)

 

  1. Amir Abboud, Arturs Backurs, Virginia Vassilevska Williams:
    Tight Hardness Results for LCS and Other Sequence Similarity Measures. FOCS 2015: 59-78.

  2. Mahdi Boroujeni, Masoud Seddighin, Saeed Seddighin:
    Improved Algorithms for Edit Distance and LCS: Beyond Worst Case. SODA 2020: 1601-1620.

  3. Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E
    A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.
    Science. 2012 Aug 17;337(6096):816-21.
    (The CRISPR-Cas9 genome editing technique was a significant contributor to the Nobel Prize in Chemistry in 2020 being awarded to Emmanuelle Charpentier and Jennifer Doudna.)

  4. Doudna JA.
    The promise and challenge of therapeutic genome editing.
    Nature. 2020 Feb;578(7794):229-236.

  5. Abbott TR, Dhamdhere G, Liu Y, Lin X, Goudy L, Zeng L, Chemparathy A, Chmura S, Heaton NS, Debs R, Pande T, Endy D, La Russa MF, Lewis DB, Qi LS.
    Development of CRISPR as an Antiviral Strategy to Combat SARS-CoV-2 and Influenza.
    Cell. 2020 May 14;181(4):865-876.e12.

  6. Severe Covid-19 GWAS Group, Ellinghaus D, Degenhardt F, ..., Karlsen TH.
    Genomewide Association Study of Severe Covid-19 with Respiratory Failure.
    N Engl J Med. 2020 Oct 15;383(16):1522-1534.

  7. Patrick Kunzmann, Benjamin E. Mayer, Kay Hamacher:
    Substitution matrix based color schemes for sequence alignment visualization. BMC Bioinform. 21(1): 209 (2020)

  8. Timo Lassmann:
    Kalign 3: multiple sequence alignment of large datasets. Bioinform. 36(6): 1928-1929 (2020)

  9. Pesho Ivanov, Benjamin Bichsel, Harun Mustafa, André Kahles, Gunnar Rätsch, Martin T. Vechev:
    AStarix: Fast and Optimal Sequence-to-Graph Alignment. RECOMB 2020: 104-119

  10. Tavor Z. Baharav, Govinda M. Kamath, David N. C. Tse, Ilan Shomorony:
    Spectral Jaccard Similarity: A New Approach to Estimating Pairwise Sequence Alignments. RECOMB 2020: 223-225
    (Also, Pattern Volume 1, Issue 6, 11 September 2020, 100081)

 

References (Recommended, but not required):
1. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, by Dan Gusfield (1997)
2. Biological Sequence Analysis, by Richard Durbin et al. (1998)
3. Computational Molecular Biology: An Algorithmic Approach, by Pavel Pevzner (2000)
4. An Introduction to Bioinformatics Algorithms, by Neil C. Jones and Pavel Pevzner (2004)
5. Sequence Comparison: Theory and Methods,” by Chao, K.-M. and Zhang, L. (2009)
6. Bioinformatics for Biologists, edited by Pavel Pevzner and Ron Shamir (2011)
7. Bioinformatics Algorithms: an Active Learning Approach, Vol. I, II by Phillip E. C. Compeau and Pavel A. Pevzner (August 2015)

[http://rosalind.info/problems/list-view/?location=bioinformatics-textbook-track]

8. Related journal and conference papers
 
Bibliography links: