Course: Algorithms for Biological Sequence Analysis
Fall semester, 2018
9:10 – 12:10 Tuesday, R107@CSIE Building
3 credits
Website: http://www.csie.ntu.edu.tw/~kmchao/seq18fall
Instructor: Kun-Mao Chao (趙坤茂)
Teaching assistant: Yi-Chen Huang (黃亦晨) r06945025
followed by @ntu.edu.tw
[TA's office hours: by appointment; Venue: R432]
Prerequisites: Some basic knowledge on algorithms is required. Background in bioinformatics and computational biology is welcome but not required for taking this course.
Coursework:
Homework assignments and Class participation (10%)
#1: Handout: September 11, 2018; Due: September 28, 2018 (See below)
Two midterm exams (70%; 35% each):
Midterm #1: Oct. 23, 2018
Midterm #2: Dec. 4, 2018
Oral presentation of selected papers/projects (20%)
Dec. 11, Dec. 18, Dec. 25.
Supporting materials:
About This Class Survey [2018/9/11]
Interesting Sequences [2018/9/11]
A Whirlwind Tour of Bioinformatics [2018/9/18]
Basic Algorithmic Strategies
[2018/9/18, 2018/9/25]
-- Basic Algorithmic Strategies [2018/9/18, 2018/9/25]
-- LIS example [2018/9/25]
-- LCS_example [2018/9/25]
Sequence Alignment [2018/9/25, 2018/10/2]
-- Global Alignment [2018/9/25]
-- A Note on Computing Heaviest Segments [2018/9/25; Pages 7-8]
-- Local Alignment [2018/9/25, 2018/10/2]
-- Various Scoring Schemes [2018/10/2]
-- An affine-gap-penalty example [2018/10/2]
-- Example 2; Solution [2018/10/2]
-- Scoring scheme examples [2018/10/2]
-- Needleman, Saul B.; and Wunsch, Christian D. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins". Journal of Molecular Biology 48 (3): 443–53. [2018/9/25]
-- Smith, Temple F.; and Waterman, Michael S. (1981). "Identification of Common Molecular Subsequences". Journal of Molecular Biology 147: 195–197. [2018/9/25, 2018/10/2]
-- Gotoh, Osamu: An improved algorithm for matching biological sequences. In: Journal of Molecular Biology. 162, 1982, S. 705-708 (PDF, 206 KB). [2018/10/2]
Space-Saving Strategies [2018/10/2, 2018/10/9]
Suboptimal Alignments [2018/10/9, 2018/10/16]
-- Space-Saving Strategies (pdf) [2018/10/2, 2018/10/9]
-- A Note for Method 7@Suboptimal Alignments [2018/10/16]
-- Eugene Myers and Webb Miller, "Optimal Alignments in Linear Space," CABIOS (Bioinformatics) 9: 169-176, 1988. [pdf]
-- Chao, K. -M., Pearson, W. R. and Miller, W., 1992, Aligning Two Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 8: 481-487.
-- Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Constrained Sequence Alignment, Bulletin of Mathematical Biology, 55: 503-524.
-- Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Locating Well-Conserved Regions within a Pairwise Alignment, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 9: 387-396.
-- Chao, K. -M., 1994, Computing All Suboptimal Alignments in Linear Space, Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807, 31-42, California, USA.
-- Chao, K. -M., Hardison R. C. and Miller, W., 1994, Recent Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291.
-- Chao, K.-M. and Miller, W., 1995, Linear-Space Algorithms that Build Local Alignments from Fragments, Algorithmica, 13: 106-134.
Multiple Sequence Alignment [2018/10/16]
-- Multiple Sequence Alignment [2018/10/16]
Homology Search [2018/10/30; 2018/11/6]
-- Homology Search Tools (pdf) [2018/10/30; 2018/11/6]
Genome Reconstruction by Phillip E. C. Compeau and Pavel A. Pevzner [2018/11/6, 2018/11/13]
pdf (the first
one in the link),
ppt (the third
one in the link)
(For ppt, you
may skip pages 125-175 & 301-340.)
-- B(2,3) [2018/11/6, 2018/11/13]
-- Eulerian Tours by Christos H. Papadimitriou & Umesh Vazirani [2018/11/6, 2018/11/13]
(Exchange 10 and 01 of the left figure on Page 4)-- An example for fragment assembly as an Eulerian cycle problem [2018/11/6, 2018/11/13]
Pattern Identification in
a Haplotype Block (ppt,
pdf) [2018/11/13, 2018/11/20]
-- An example [2018/11/13, 2018/11/20]
Haplotype Inference (ppt,
pdf) [2018/11/20]
(Read the problem formulation on pp. 1262-1263 of the pdf
file)
-- Some other ways of formulation [2018/11/20]
Heaviest Segments [2018/11/27]
-- A Note on Computing Heaviest Segments [2018/11/27]
RMSQ [2018/11/27]
-- Chen, K.-Y. and Chao, K.-M., 2007, “On the Range Maximum-Sum Segment Query Problem,” Discrete Applied Mathematics, 155: 2043-2052. [2018/11/27]
Homework assignments:
#1: Handout: September 11, 2018; Due: September 28, 2018
Send TA a PowerPoint page including: (Subject: [HW1] Your student ID number)
Your Name
A sequence which, in your opinion, is sort of interesting or inspiring.
A short reason explaining why it is interesting or inspiring.
[All in one page, please.]
More sample solutions: 2007 2008 2009 2011 2015 2016 2017
Class presentations:
1. The expected number of team members: <6;
2. Each member is required to present in turn [about 150/(the number of speakers on the same day) minutes each];
3. Revised slides should be sent to me within one week after the presentation. Please compress your figures.
4. Questions in class are always welcome.
Dec. 11, 2018
Dec. 18, 2018
陳凱普
游文萱
Dec. 25, 2018
陳耘志
陳力
駱定瑄
陳主牧
于建民 藍祥予
李衡
賴達
王建元
陳譽穆
吳孟軒
陳博彰
Selected papers for presentation:
(You may access these selected articles using computers with NTU IP addresses.)
Haplotype matching in
large cohorts using the Li and Stephens model.
Lunter G.
Bioinformatics. 2018 Aug 25. doi: 10.1093/bioinformatics/bty735. [Epub ahead
of print]
CoMSA: Compression of protein multiple sequence alignment files.
Deorowicz S, Walczyszyn J, Debudaj-Grabysz A.
Bioinformatics. 2018 Jul 13. doi: 10.1093/bioinformatics/bty619. [Epub ahead
of print]
Parallel Computation of the Burrows-Wheeler Transform of Short Reads Using
Prefix Parallelism.
Kimura K, Koike A.
IEEE/ACM Trans Comput Biol Bioinform. 2018 May 17. doi:
10.1109/TCBB.2018.2837749. [Epub ahead of print]
Wheeler graphs: A framework for BWT-based data structures.
Gagie T, Manzini G, Sirén J.
Theor Comput Sci. 2017 Oct 25;698:67-78. doi: 10.1016/j.tcs.2017.06.016.
Haplosaurus computes protein haplotypes for use in precision drug design
William Spooner, William McLaren, Timothy Slidel, Donna K. Finch, Robin
Butler, Jamie Campbell, Laura Eghobamien, David Rider, Christine Mione
Kiefer, Matthew J. Robinson, Colin Hardman, Fiona Cunningham, Tristan
Vaughan, Paul Flicek & Catherine Chaillan Huntington
Nature Communications, 9, Article number: 4128 (2018).
Family-based germline sequencing in children with cancer
Michaela Kuhlen, Julia Taeubner, Triantafyllia Brozou, Dagmar Wieczorek,
Reiner Siebert & Arndt Borkhardt
Oncogene, Published: 10 October, 2018.
Evaluation of 244,000 synthetic sequences reveals design principles to
optimize translation in Escherichia coli
Guillaume Cambray, Joao C Guimaraes & Adam Paul Arkin
Nature Biotechnology, volume 36, pages 1005–1015 (2018).
High-quality genome sequences of uncultured microbes by assembly of read
clouds
Alex Bishara, Eli L Moss, Mikhail Kolmogorov, Alma E Parada, Ziming Weng,
Arend Sidow, Anne E Dekas, Serafim Batzoglou & Ami S Bhatt
Nature Biotechnology, Published: 15 October, 2018.
CRISPR knockout screening identifies combinatorial drug targets in
pancreatic cancer and models cellular drug response
Karol Szlachta, Cem Kuscu, Turan Tufan, Sara J. Adair, Stephen Shang, Alex
D. Michaels, Matthew G. Mullen, Natasha Lopes Fischer, Jiekun Yang, Limin
Liu, Prasad Trivedi, Edward B. Stelow, P. Todd Stukenberg, J. Thomas
Parsons, Todd W. Bauer & Mazhar Adli
Nature Communications, 9, Article number: 4275 (2018).
Computational tools to unmask transposable elements
Patricia Goerner-Potvin & Guillaume Bourque
Nature Reviews Genetics, 19, pages688–704 (2018).
Deep generative models of genetic variation capture the effects of mutations
Adam J. Riesselman, John B. Ingraham & Debora S. Marks
Nature Methods, 15, 816–822 (2018).
In silico read normalization using set multi-cover optimization
Dilip A Durai; Marcel H Schulz
Bioinformatics, Volume 34, Issue 19, 1 October 2018, Pages 3273–3280.
Minimap2: pairwise alignment for nucleotide sequences
Heng Li
Bioinformatics, Volume 34, Issue 18, 15 September 2018, Pages 3094–3100.
Transcriptional fates of human-specific segmental duplications in brain
Max L. Dougherty, Jason G. Underwood, Bradley J. Nelson, Elizabeth Tseng,
Katherine M. Munson, Osnat Penn, Tomasz J. Nowakowski, Alex A. Pollen, and
Evan E. Eichler
Genome Res. October 2018 28: 1566-1576.
Exploiting CRISPR–Cas9 technology to investigate individual histone
modifications
Juan-José Vasquez; Carolin Wedel; Raul O Cosentino; T Nicolai Siegel
Nucleic Acids Research, Volume 46, Issue 18, 12 October 2018, Pages e106.
[http://rosalind.info/problems/list-view/?location=bioinformatics-textbook-track]