Course: Algorithms for Biological Sequence Analysis
Fall semester, 2018
9:10 – 12:10 Tuesday, R107@CSIE Building
3 credits

Website: http://www.csie.ntu.edu.tw/~kmchao/seq18fall

Instructor: Kun-Mao Chao (趙坤茂)

Teaching assistant: Yi-Chen Huang (黃亦晨)  r06945025 followed by @ntu.edu.tw
[TA's office hours: by appointment; Venue: R432]

 

Prerequisites: Some basic knowledge on algorithms is required. Background in bioinformatics and computational biology is welcome but not required for taking this course.

 

Classmates: I  II  III  IV  V  VI

 

Coursework:

    Homework assignments and Class participation (10%)

        #1: Handout: September 11, 2018; Due: September 28, 2018 (See below)

    Two midterm exams (70%; 35% each):

        Midterm #1: Oct. 23, 2018
       
Midterm #2: Dec. 4, 2018

    Oral presentation of selected papers/projects (20%)

        Dec. 11, Dec. 18, Dec. 25.

 

Supporting materials:

    About This Class     Survey [2018/9/11]

    Interesting Sequences [2018/9/11]

    A Whirlwind Tour of Bioinformatics [2018/9/18]

    Basic Algorithmic Strategies [2018/9/18, 2018/9/25]
 

              -- Basic Algorithmic Strategies [2018/9/18, 2018/9/25]

              -- LIS example [2018/9/25]

              -- LCS_example [2018/9/25]
 

    Sequence Alignment [2018/9/25, 2018/10/2]

-- Global Alignment [2018/9/25]

-- A Note on Computing Heaviest Segments [2018/9/25; Pages 7-8]

-- Local Alignment [2018/9/25, 2018/10/2]

-- Various Scoring Schemes [2018/10/2]

-- An affine-gap-penalty example [2018/10/2]

-- Example 2; Solution [2018/10/2]

-- Scoring scheme examples [2018/10/2]

-- Needleman, Saul B.; and Wunsch, Christian D. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins". Journal of Molecular Biology 48 (3): 443–53. [2018/9/25]

-- Smith, Temple F.; and Waterman, Michael S. (1981). "Identification of Common Molecular Subsequences". Journal of Molecular Biology 147: 195–197. [2018/9/25, 2018/10/2]

-- Gotoh, Osamu: An improved algorithm for matching biological sequences. In: Journal of Molecular Biology. 162, 1982, S. 705-708 (PDF, 206 KB). [2018/10/2]

    Space-Saving Strategies [2018/10/2, 2018/10/9]

    Suboptimal Alignments [2018/10/9, 2018/10/16]

-- Space-Saving Strategies (pdf) [2018/10/2, 2018/10/9]

-- A Note for Method 7@Suboptimal Alignments [2018/10/16]

-- Eugene Myers and Webb Miller, "Optimal Alignments in Linear Space," CABIOS (Bioinformatics) 9: 169-176, 1988. [pdf]

-- Chao, K. -M., Pearson, W. R. and Miller, W., 1992, Aligning Two Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 8: 481-487.

-- Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Constrained Sequence Alignment, Bulletin of Mathematical Biology, 55: 503-524.

-- Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Locating Well-Conserved Regions within a Pairwise Alignment, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 9: 387-396.

-- Chao, K. -M., 1994, Computing All Suboptimal Alignments in Linear Space, Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807, 31-42, California, USA.

-- Chao, K. -M., Hardison R. C. and Miller, W., 1994, Recent Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291.

-- Chao, K.-M. and Miller, W., 1995, Linear-Space Algorithms that Build Local Alignments from Fragments, Algorithmica, 13: 106-134.

    Multiple Sequence Alignment [2018/10/16]

    -- Multiple Sequence Alignment [2018/10/16]

Homology Search [2018/10/30; 2018/11/6]

-- Homology Search Tools (pdf) [2018/10/30; 2018/11/6]

   Genome Reconstruction by Phillip E. C. Compeau and Pavel A. Pevzner [2018/11/6, 2018/11/13]

            pdf (the first one in the link), ppt (the third one in the link)
(For ppt, you may skip pages 125-175 & 301-340.)

    -- B(2,3) [2018/11/6, 2018/11/13]

    -- Eulerian Tours by Christos H. Papadimitriou & Umesh Vazirani [2018/11/6, 2018/11/13]
(Exchange 10 and 01 of the left figure on Page 4)

    -- An example for fragment assembly as an Eulerian cycle problem [2018/11/6, 2018/11/13]

    Pattern Identification in a Haplotype Block (ppt, pdf) [2018/11/13, 2018/11/20]
 

              -- An example [2018/11/13, 2018/11/20]

 

    Haplotype Inference (ppt, pdf) [2018/11/20]
    (Read the problem formulation on pp. 1262-1263 of the pdf file)

    -- Some other ways of formulation [2018/11/20]

    Heaviest Segments [2018/11/27]

   -- A Note on Computing Heaviest Segments [2018/11/27]

    RMSQ [2018/11/27]

-- Chen, K.-Y. and Chao, K.-M.,  2007, “On the Range Maximum-Sum Segment Query Problem,” Discrete Applied Mathematics, 155: 2043-2052. [2018/11/27]

Homework assignments:

 

#1: Handout: September 11, 2018; Due: September 28, 2018

    Send TA a PowerPoint page including: (Subject: [HW1] Your student ID number)

   A sample solution

   More sample solutions: 2007  2008  2009  2011 2015 2016 2017

   2018

 

Class presentations:

1.      The expected number of team members: <6;

2.      Each member is required to present in turn [about 150/(the number of speakers on the same day) minutes each];

3.      Revised slides should be sent to me within one week after the presentation. Please compress your figures.

4.      Questions in class are always welcome.

 

Dec. 11, 2018

Dec. 18, 2018

 

Dec. 25, 2018

 

Selected papers for presentation:

(You may access these selected articles using computers with NTU IP addresses.)

  1. Haplotype matching in large cohorts using the Li and Stephens model.
    Lunter G.
    Bioinformatics. 2018 Aug 25. doi: 10.1093/bioinformatics/bty735. [Epub ahead of print]

  2. CoMSA: Compression of protein multiple sequence alignment files.
    Deorowicz S, Walczyszyn J, Debudaj-Grabysz A.
    Bioinformatics. 2018 Jul 13. doi: 10.1093/bioinformatics/bty619. [Epub ahead of print]

  3. Parallel Computation of the Burrows-Wheeler Transform of Short Reads Using Prefix Parallelism.
    Kimura K, Koike A.
    IEEE/ACM Trans Comput Biol Bioinform. 2018 May 17. doi: 10.1109/TCBB.2018.2837749. [Epub ahead of print]

  4. Wheeler graphs: A framework for BWT-based data structures.
    Gagie T, Manzini G, Sirén J.
    Theor Comput Sci. 2017 Oct 25;698:67-78. doi: 10.1016/j.tcs.2017.06.016.

  5. Haplosaurus computes protein haplotypes for use in precision drug design
    William Spooner, William McLaren, Timothy Slidel, Donna K. Finch, Robin Butler, Jamie Campbell, Laura Eghobamien, David Rider, Christine Mione Kiefer, Matthew J. Robinson, Colin Hardman, Fiona Cunningham, Tristan Vaughan, Paul Flicek & Catherine Chaillan Huntington
    Nature Communications, 9, Article number: 4128 (2018).

  6. Family-based germline sequencing in children with cancer
    Michaela Kuhlen, Julia Taeubner, Triantafyllia Brozou, Dagmar Wieczorek, Reiner Siebert & Arndt Borkhardt
    Oncogene, Published: 10 October, 2018.

  7. Evaluation of 244,000 synthetic sequences reveals design principles to optimize translation in Escherichia coli
    Guillaume Cambray, Joao C Guimaraes & Adam Paul Arkin
    Nature Biotechnology, volume 36, pages 1005–1015 (2018).

  8. High-quality genome sequences of uncultured microbes by assembly of read clouds
    Alex Bishara, Eli L Moss, Mikhail Kolmogorov, Alma E Parada, Ziming Weng, Arend Sidow, Anne E Dekas, Serafim Batzoglou & Ami S Bhatt
    Nature Biotechnology, Published: 15 October, 2018.

  9. CRISPR knockout screening identifies combinatorial drug targets in pancreatic cancer and models cellular drug response
    Karol Szlachta, Cem Kuscu, Turan Tufan, Sara J. Adair, Stephen Shang, Alex D. Michaels, Matthew G. Mullen, Natasha Lopes Fischer, Jiekun Yang, Limin Liu, Prasad Trivedi, Edward B. Stelow, P. Todd Stukenberg, J. Thomas Parsons, Todd W. Bauer & Mazhar Adli
    Nature Communications, 9, Article number: 4275 (2018).

  10. Computational tools to unmask transposable elements
    Patricia Goerner-Potvin & Guillaume Bourque
    Nature Reviews Genetics, 19, pages688–704 (2018).

  11. Deep generative models of genetic variation capture the effects of mutations
    Adam J. Riesselman, John B. Ingraham & Debora S. Marks
    Nature Methods, 15, 816–822 (2018).

  12. In silico read normalization using set multi-cover optimization
    Dilip A Durai; Marcel H Schulz
    Bioinformatics, Volume 34, Issue 19, 1 October 2018, Pages 3273–3280.

  13. Minimap2: pairwise alignment for nucleotide sequences
    Heng Li
    Bioinformatics, Volume 34, Issue 18, 15 September 2018, Pages 3094–3100.

  14. Transcriptional fates of human-specific segmental duplications in brain
    Max L. Dougherty, Jason G. Underwood, Bradley J. Nelson, Elizabeth Tseng, Katherine M. Munson, Osnat Penn, Tomasz J. Nowakowski, Alex A. Pollen, and Evan E. Eichler
    Genome Res. October 2018 28: 1566-1576.

  15. Exploiting CRISPR–Cas9 technology to investigate individual histone modifications
    Juan-José Vasquez; Carolin Wedel; Raul O Cosentino; T Nicolai Siegel
    Nucleic Acids Research, Volume 46, Issue 18, 12 October 2018, Pages e106.

 

References (Recommended, but not required):
1. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, by Dan Gusfield (1997)
2. Biological Sequence Analysis, by Richard Durbin et al. (1998)
3. Computational Molecular Biology: An Algorithmic Approach, by Pavel Pevzner (2000)
4. An Introduction to Bioinformatics Algorithms, by Neil C. Jones and Pavel Pevzner (2004)
5. Sequence Comparison: Theory and Methods,” by Chao, K.-M. and Zhang, L. (2009)
6. Bioinformatics for Biologists, edited by Pavel Pevzner and Ron Shamir (2011)
7. Bioinformatics Algorithms: an Active Learning Approach, Vol. I, II by Phillip E. C. Compeau and Pavel A. Pevzner (August 2015)

[http://rosalind.info/problems/list-view/?location=bioinformatics-textbook-track]

8. Related journal and conference papers
 
Bibliography links: