Course: Algorithms for Biological Sequence Analysis
Fall semester, 2016
9:10 - 12:10 Monday, R105@CSIE Building
3 credits
Web site: http://www.csie.ntu.edu.tw/~kmchao/seq16fall
Instructor: Kun-Mao Chao (趙坤茂)
Teaching assistant: 鄭佳銘 r04922024 加NTU的IP (@ntu...)
[TA's office hours: 10-12AM, Tuesdays; Venue: R432]
Prerequisites: Some basic knowledge on algorithms is required. Background in bioinformatics and computational biology is welcome but not required for taking this course.
Classmates: I II III IV V VI VII
Coursework:
Homework assignments and Class participation (10%)
Two midterm exams (70%; 35% each):
1. Midterm #1: Oct. 31, 2016
2. Midterm #2: Dec. 5, 2016
Oral presentation of selected papers/projects (20%)
Supporting materials:
About This Class Survey [2016/9/12]
Interesting Sequences [2016/9/12]
A Whirlwind Tour of Bioinformatics [2016/9/19]
Basic Algorithmic Strategies [2016/9/19; 2016/9/26]
-- Basic Algorithmic Strategies [2016/9/19; 2016/9/26]
-- LIS example [2016/9/19]
Sequence Alignment [2016/9/26; 2016/10/3; 2016/10/17]
-- Global Alignment [2016/9/26]
-- Local Alignment [2016/9/26]
-- Various Scoring Schemes [2016/10/3]
-- An affine-gap-penalty example [2016/10/3]
-- Example 2; Solution [2016/10/17]
-- Scoring scheme examples [2016/10/3]
-- Needleman, Saul B.; and Wunsch, Christian D. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins". Journal of Molecular Biology 48 (3): 443–53.
-- Smith, Temple F.; and Waterman, Michael S. (1981). "Identification of Common Molecular Subsequences". Journal of Molecular Biology 147: 195–197.
-- Gotoh, Osamu: An improved algorithm for matching biological sequences. In: Journal of Molecular Biology. 162, 1982, S. 705-708 (PDF, 206 KB).
Space-Saving Strategies [2016/10/17; 2016/10/24]
Suboptimal Alignments [2016/10/24]
-- Space-Saving Strategies (pdf) [2016/10/17; 2016/10/24]
-- Eugene Myers and Webb Miller, "Optimal Alignments in Linear Space," CABIOS (Bioinformatics) 9: 169-176, 1988. [pdf] [2016/10/17]
-- Chao, K. -M., Pearson, W. R. and Miller, W., 1992, Aligning Two Sequences within a Specified Diagonal Band, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 8: 481-487.
-- Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Constrained Sequence Alignment, Bulletin of Mathematical Biology, 55: 503-524.
-- Chao, K.-M., Hardison, R. C. and Miller, W., 1993, Locating Well-Conserved Regions within a Pairwise Alignment, Computer Applications in the Biosciences (CABIOS, now Bioinformatics), 9: 387-396.
-- Chao, K. -M., 1994, Computing All Suboptimal Alignments in Linear Space, Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807, 31-42, California, USA. [2016/10/24]
-- Chao, K. -M., Hardison R. C. and Miller, W., 1994, Recent Developments in Linear-Space Alignment Methods: a Survey, Journal of Computational Biology, 1: 271-291. [2016/10/17; 2016/10/24]
-- Chao, K.-M. and Miller, W., 1995, Linear-Space Algorithms that Build Local Alignments from Fragments, Algorithmica, 13: 106-134.
Multiple Sequence Alignment [2016/11/7; 2016/11/14]
-- Multiple Sequence Alignment [2016/11/7; 2016/11/14]
Homology Search [2016//11/7; 2016/11/14; You may skip pp. 19-22 and pp. 34-48 ]
-- Homology Search Tools (pdf) [2016/11/7; 2016/11/14]
Genome Reconstruction by Phillip E. C. Compeau and Pavel A. Pevzner [2016/11/21]
pdf (the first
one in the link),
ppt (the third
one in the link)
(For ppt, you
may skip pages 125-175 & 301-350.)
-- B(2,3) [2016/11/21]
-- Eulerian Tours by Christos H. Papadimitriou & Umesh Vazirani [2016/11/21]
(Exchange 10 and 01 of the left figure on Page 4)-- An example for fragment assembly as an Eulerian cycle problem [2016/11/21]
Pattern Identification in a Haplotype Block (ppt, pdf) [2016/11/28]
-- An example [2016/11/28]
Haplotype Inference (ppt,
pdf) [2016/11/28]
(Read the problem formulation on pp. 1262-1263 of the pdf
file)
-- Some other ways of formulation [2016/11/28]
-- A Note on Computing Heaviest Segments by Kun-Mao Chao []
RMSQ []
-- Chen, K.-Y. and Chao, K.-M., 2007, “On the Range Maximum-Sum Segment Query Problem,” Discrete Applied Mathematics, 155: 2043-2052. []
Homework assignments:
#1: Handout: September 12, 2016; Due: September 28, 2016
Send TA a PowerPoint page including: (Subject: [HW1] Your student ID number)
Your Name
A sequence which, in your opinion, is sort of interesting or inspiring.
A short reason explaining why it is interesting or inspiring.
[All in one page, please.]
More sample solutions: 2007 2008 2009 2011 2015
Class presentations:
1. The expected number of team members: <6;
2. Each member is required to present in turn [about 150/(the number of speakers on the same day) minutes each];
3. Revised slides should be sent to me within one week after the presentation. Please compress your figures.
4. Questions in class are always welcome.
Selected papers for presentation:
(You may access these selected articles using computers with NTU IP addresses.)
Dec. 12, 2016
Richard Van Noorden, Brendan Maher, and Regina Nuzzo, "The top 100 papers--Nature explores the most-cited research of all time," Nature 514, 550–553, 2014. Slides
徐銘聰 林盈安 簡上祐
BLAST
Altschul, Stephen;
Gish, Warren;
Miller, Webb;
Myers, Eugene;
Lipman, David (1990).
"Basic local alignment search tool". Journal of Molecular Biology
215 (3): 403–410.
Slides
傅安 陳岳昇 陳宗涵
Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z.,
Miller, W., & Lipman, D. J. (1997).
Gapped
BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Research, 25(17), 3389-3402.
陳奕均
謝秉翰
BLAST@NCBI
Slides
林哲賢 謝友恆 李沂芳 黃堂榮 林資皓
Patro R and
Kingsford C, Data-dependent bucketing improves reference-free compression of
sequencing reads, Bioinformatics, 31(17):2770-7, 2015.
Slides
張嘉和
張育銓
Kingsford C and
Patro R, Reference-based compression of short-read sequences using path
encoding, Bioinformatics, 31(12):1920-8, 2015.
周詮儒
陳星妘
Dec. 19, 2016
S P Pfeifer,
From next-generation resequencing reads to a high-quality variant data set,
Heredity (advance online publication 19 October 2016; doi:
10.1038/hdy.2016.102).
廖珮函 吳季芸 劉倢伶 莊立宇
Joshua G.
Schraiber and Joshua M. Akey,
Methods and
models for unravelling human evolutionary history,
Nature Reviews Genetics (Published online 10 November 2015).
王毅
王景逸
黃家富
翁婉倩
M. Gallego
Llorente, et al.,
Ancient
Ethiopian genome reveals extensive Eurasian admixture throughout the African
continent,
Science 13 November 2015: Vol. 350 no. 6262 pp. 820-822.
陳彥安
沈予涵
William Chow,
Kim Brugger, Mario Caccamo, Ian Sealy, James Torrance, and Kerstin Howe, gEVAL — a
web-based browser for evaluating genome assemblies, Bioinformatics, 32:
2508-2510, 2016.
許靖 張宇軒
Nicholas J.
Loman and Mark J. Pallen,
Twenty years of
bacterial genome sequencing,
Nature Reviews Microbiology (Published online 09 November 2015).
張乘若 雷晏菁
Dec. 26, 2016
John R. McCarrey,
The epigenome—a
family affair,
Science 6 November 2015: Vol. 350 no. 6261 pp. 634-635.
周柳村 陳善文 黃皇堯 陳柏銓
Pearson, WR.,
Selecting the
Right Similarity-Scoring Matrix.,
Curr Protoc Bioinformatics. 2013.
劉瀚聲
張凱捷
Wang, Q., Korkin, D., and Shang, Y., “A
Fast Multiple Longest Common Subsequence (MLCS) Algorithm,”
IEEE Transactions on Knowledge and Data Engineering (TKDE), vol.
23(3), 2011.
黃亦晨 李為紹 許凱翔
Li, H. and
Durbin, R.,
Fast and
accurate short read alignment with Burrows-Wheeler transform
Bioinformatics. 2009 Jul 15;25(14):1754-60.
簡少凡,李景立,程子玲
Gates W.H. and Papadimitriou, C.H.
Bounds for
sorting by prefix reversal.
Discrete Math. 27 (1979), 47--57.
徐新凱
夏維良
何俊宏
Abdullah-Al Mamun, Soumitra Pal and Sanguthevar Rajasekaran, KCMBT: a k-mer Counter based on Multiple Burst Trees, Bioinformatics, 32: 2783-2790, 2016.
[http://rosalind.info/problems/list-view/?location=bioinformatics-textbook-track]