Course: Algorithms for Biological Sequence Analysis
Fall semester, 2019
9:10 – 12:10 Tuesday, R107@CSIE Building
3 credits
Website: http://www.csie.ntu.edu.tw/~kmchao/seq19fall
Instructor: Kun-Mao Chao (趙坤茂)
Teaching assistant: 陳百榆
r07922177
followed by @ntu.edu.tw
[TA's office hours: by appointment; Venue: R432]
Prerequisites: Some basic knowledge on algorithms is required. Background in bioinformatics and computational biology is welcome but not required for taking this course.
Coursework:
Homework assignments and Class participation (10%)
#1: Handout: September 10, 2019; Due: September 27, 2019 (See below)
Two midterm exams (70%; 35% each):
Midterm #1: Oct. 22, 2019
Midterm #2: Dec. 3, 2019
Oral presentation of selected papers/projects (20%)
[Dec. 10 for preparation],
Dec. 17, Dec. 24, Dec. 31.
Supporting materials:
About This Class Survey [2019/9/10]
Interesting Sequences [2019/9/10]
A Whirlwind Tour of Bioinformatics [2019/9/10]
Basic Algorithmic Strategies
[2019/9/17]
-- Basic Algorithmic Strategies [2019/9/17]
-- LIS example [2019/9/17]
-- LCS_example [2019/9/17]
Sequence Alignment [2019/9/17, 2019/9/24, 2019/10/1]
-- Global Alignment [2019/9/17, 2019/9/24]
-- A Note on Computing Heaviest Segments [Pages 7-8] [2019/9/24]
-- Local Alignment [2019/9/24]
-- Various Scoring Schemes [2019/9/24]
-- An affine-gap-penalty example [2019/9/24]
-- Example 2; Solution [2019/9/24]
-- Scoring scheme examples [2019/9/24]
-- Needleman, Saul B.; and Wunsch, Christian D. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins". Journal of Molecular Biology 48 (3): 443–53.
-- Smith, Temple F.; and Waterman, Michael S. (1981). "Identification of Common Molecular Subsequences". Journal of Molecular Biology 147: 195–197.
-- Gotoh, Osamu: An improved algorithm for matching biological sequences. In: Journal of Molecular Biology. 162, 1982, S. 705-708 (PDF, 206 KB).
Space-Saving Strategies [2019/10/1]
Suboptimal Alignments [2019/10/8]
-- Space-Saving Strategies (pdf) [2019/10/1, 2019/10/8]
-- A Note for Method 7@Suboptimal Alignments [2019/10/8]
-- Eugene Myers and Webb Miller, "Optimal Alignments in Linear Space," CABIOS (Bioinformatics) 9: 169-176, 1988. [pdf]
-- Chao, K. -M., Pearson, W. R. and Miller, W., 1992, Aligning Two Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 8: 481-487.
-- Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Constrained Sequence Alignment, Bulletin of Mathematical Biology, 55: 503-524.
-- Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Locating Well-Conserved Regions within a Pairwise Alignment, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 9: 387-396.
-- Chao, K. -M., 1994, Computing All Suboptimal Alignments in Linear Space, Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807, 31-42, California, USA.
-- Chao, K. -M., Hardison R. C. and Miller, W., 1994, Recent Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291.
-- Chao, K.-M. and Miller, W., 1995, Linear-Space Algorithms that Build Local Alignments from Fragments, Algorithmica, 13: 106-134.
Multiple Sequence Alignment [2019/10/15]
-- Multiple Sequence Alignment [2019/10/15]
Homology Search [2019/10/29, 2019/11/5]
-- Homology Search Tools (pdf) [2019/10/29, 2019/11/5]
Burrows-Wheeler Transform and FM-Index [2019/10/29]
Genome Reconstruction by Phillip E. C. Compeau and Pavel A. Pevzner [2019/11/12]
-- ppt (You may skip pages 125-175 & 301-340.) [2019/11/12]
-- Lectures by Phillip E. C. Compeau
-- B(2,3) [2019/11/12]
-- Eulerian Tours by Christos H. Papadimitriou & Umesh Vazirani [2019/11/12]
(Exchange 10 and 01 of the left figure on Page 4)-- An example for fragment assembly as an Eulerian cycle problem [2019/11/12]
Pattern Identification in
a Haplotype Block (ppt,
pdf) [2019/11/19]
-- An example [2019/11/19]
Haplotype Inference (ppt,
pdf) [2019/11/19, 2019/11/26]
(Read the problem formulation on pp. 1262-1263 of the pdf
file)
-- Some other ways of formulation [2019/11/26]
Heaviest Segments [updated on 2019/11/26]
-- A Note on Computing Heaviest Segments [2019/11/26]
RMSQ [updated on 2019/11/26]
-- Chen, K.-Y. and Chao, K.-M., 2007, “On the Range Maximum-Sum Segment Query Problem,” Discrete Applied Mathematics, 155: 2043-2052. [2019/11/26]
Homework assignments:
#1: Handout: September 10, 2019; Due: September 27, 2019
Send TA a PowerPoint page including: (Subject: [HW1] Your student ID number)
Your Name
A sequence which, in your opinion, is sort of interesting or inspiring.
A short reason explaining why it is interesting or inspiring.
[All in one page, please.]
More sample solutions: 2007 2008 2009 2011 2015 2016 2017 2018
Class presentations:
1. The expected number of team members: 1~6;
2. Each member is required to present in turn [about 10 minutes each];
3. Revised slides should be sent to me within one week after the presentation.
4. Questions in class are always welcome.
[Dec. 10 for preparation, no class]
Dec. 17
Deep generative models of genetic variation capture the effects of mutations
Adam J. Riesselman, John B. Ingraham & Debora S. Marks
Nature Methods, 15, 816–822 (2018).
黃筑葭 Andreas Pirchner Eloi Dussy Lachaud 林溥博
Ultrafast and memory-efficient alignment of short DNA sequences to the human
genome
Langmead B, Trapnell C, Pop M, Salzberg SL.
Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25.
陳百榆 王俊中
JUHA KARKKAINEN, PETER SANDERS, STEFAN BURKHARDT
JACM, Volume 53 Issue 6, November 2006.
徐展鴻
The
streaming k-mismatch problem.
Raphaël Clifford, Tomasz Kociumaka, Ely Porat
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete
Algorithms, SODA 2019, 1106-1125.
張顥霆 林映廷 廖名淳
Dec. 24
Approximating LCS in Linear Time: Beating the √n Barrier.
MohammadTaghi Hajiaghayi, Masoud Seddighin, Saeed Seddighin, Xiaorui Sun
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete
Algorithms, SODA 2019, 1181-1200.
曾奕青 蔡昀達 張凱捷 謝宏祺 (11/20+)
Massively Parallel Approximation Algorithms for Edit Distance and Longest
Common Subsequence.
MohammadTaghi Hajiaghayi, Saeed Seddighin, Xiaorui Sun
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete
Algorithms, SODA 2019, 1654-1672.
紀伯翰 林義聖 黃子賢 楊書文
Efficiently Approximating Edit Distance Between Pseudorandom Strings.
William Kuszmaul
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete
Algorithms, SODA 2019, 1165-1180.
洪靖秦 陳柔安 廖克允 蕭大哲 魏孜昀 傅家靖
Dec. 31
MSC: a metagenomic sequence classification algorithm
Subrata Saha, Jethro Johnson, Soumitra Pal, George M Weinstock, Sanguthevar
Rajasekaran
Bioinformatics, Volume 35, Issue 17, 1 September 2019, Pages 2932–2940.
楊宗儒 黃志揚
CoMSA: Compression of protein multiple sequence alignment files.
Deorowicz S, Walczyszyn J, Debudaj-Grabysz A.
Bioinformatics. 2018 Jul 13. doi: 10.1093/bioinformatics/bty619. [Epub ahead
of print]
林政豪 張立暐
Minimap2: pairwise alignment for nucleotide sequences
Heng Li
Bioinformatics, Volume 34, Issue 18, 15 September 2018, Pages 3094–3100.
王佑安 賴達 陳昇 洪浩翔
High-quality genome
sequences of uncultured microbes by assembly of read clouds (11/13+)
Alex Bishara, Eli L Moss, Mikhail Kolmogorov, Alma E Parada, Ziming Weng,
Arend Sidow, Anne E Dekas, Serafim Batzoglou & Ami S Bhatt
Nature Biotechnology, Published: 15 October, 2018.
譚雋飛
In silico read normalization using set multi-cover optimization (12/22+)
Dilip A Durai; Marcel H Schulz
Bioinformatics, Volume 34, Issue 19, 1 October 2018, Pages 3273–3280.
徐衍新
Selected papers for presentation:
(You may access these selected articles using computers with NTU IP addresses.)
Approximating LCS in Linear Time: Beating the √n Barrier.
MohammadTaghi Hajiaghayi, Masoud Seddighin, Saeed Seddighin, Xiaorui Sun
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete
Algorithms, SODA 2019, 1181-1200.
Massively Parallel Approximation Algorithms for Edit Distance and Longest
Common Subsequence.
MohammadTaghi Hajiaghayi, Saeed Seddighin, Xiaorui Sun
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete
Algorithms, SODA 2019, 1654-1672.
Efficiently Approximating Edit Distance Between Pseudorandom Strings.
William Kuszmaul
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete
Algorithms, SODA 2019, 1165-1180.
The
streaming k-mismatch problem.
Raphaël Clifford, Tomasz Kociumaka, Ely Porat
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete
Algorithms, SODA 2019, 1106-1125.
CoMSA: Compression of protein multiple sequence alignment files.
Deorowicz S, Walczyszyn J, Debudaj-Grabysz A.
Bioinformatics. 2018 Jul 13. doi: 10.1093/bioinformatics/bty619. [Epub ahead
of print]
Evaluation of 244,000 synthetic sequences reveals design principles to
optimize translation in Escherichia coli
Guillaume Cambray, Joao C Guimaraes & Adam Paul Arkin
Nature Biotechnology, volume 36, pages 1005–1015 (2018).
High-quality genome
sequences of uncultured microbes by assembly of read clouds
Alex Bishara, Eli L Moss, Mikhail Kolmogorov, Alma E Parada, Ziming Weng,
Arend Sidow, Anne E Dekas, Serafim Batzoglou & Ami S Bhatt
Nature Biotechnology, Published: 15 October, 2018.
CRISPR knockout screening identifies combinatorial drug targets in
pancreatic cancer and models cellular drug response
Karol Szlachta, Cem Kuscu, Turan Tufan, Sara J. Adair, Stephen Shang, Alex
D. Michaels, Matthew G. Mullen, Natasha Lopes Fischer, Jiekun Yang, Limin
Liu, Prasad Trivedi, Edward B. Stelow, P. Todd Stukenberg, J. Thomas
Parsons, Todd W. Bauer & Mazhar Adli
Nature Communications, 9, Article number: 4275 (2018).
Deep generative
models of genetic variation capture the effects of mutations
Adam J. Riesselman, John B. Ingraham & Debora S. Marks
Nature Methods, 15, 816–822 (2018).
In silico read normalization using set multi-cover optimization
Dilip A Durai; Marcel H Schulz
Bioinformatics, Volume 34, Issue 19, 1 October 2018, Pages 3273–3280.
Minimap2: pairwise alignment for nucleotide sequences
Heng Li
Bioinformatics, Volume 34, Issue 18, 15 September 2018, Pages 3094–3100.
MSC: a metagenomic sequence classification algorithm
Subrata Saha, Jethro Johnson, Soumitra Pal, George M Weinstock, Sanguthevar
Rajasekaran
Bioinformatics, Volume 35, Issue 17, 1 September 2019, Pages 2932–2940.
Improved indel detection in DNA and RNA via realignment with
ABRA2
Lisle E Mose, Charles M Perou, Joel S Parker
Bioinformatics, Volume 35, Issue 17, 1 September 2019, Pages 2966–2973.
PopCluster: an algorithm to identify genetic variants with
ethnicity-dependent effects
Anastasia Gurinovich, Harold Bae, John J Farrell, Stacy L Andersen, Stefano
Monti ...
Bioinformatics, Volume 35, Issue 17, 1 September 2019, Pages 3046–3054.
Ultrafast and memory-efficient alignment of short DNA sequences to the human
genome
Langmead B, Trapnell C, Pop M, Salzberg SL.
Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25.
Fast
gapped-read alignment with Bowtie 2
Ben Langmead and Steven L Salzberg
Nat Methods. 2012 Mar 4; 9(4): 357–359.
[http://rosalind.info/problems/list-view/?location=bioinformatics-textbook-track]