----- Original Message -----
From: Kun-Mao Chao
To: ACB
Sent: Monday, November 17, 2008 10:55 AM
Subject: Two talks in December

Dear ACB members,
 
In December, I shall host two talks in bioinformatics: one is by Steven Skiena and the other is by Yang-Ho Chen, a former ACB member. Two coincidences: (1) Both of them will talk about the next-generation DNA sequencing technologies; and (2) Yang-Ho's current advisor is Prof Ting Chen, who was advised by Prof Skiena back in 1990's. Meanwhile, Prof Skiena's repository site (http://www.cs.sunysb.edu/~algorith/) is a popular site for algorithm implementaions. (Yao-Ting once got some code from there.)
 
Below please find the flyers of the two talks. I also attach a nice review paper for this interesting topic. Let us form a study group on this topic soon. 
 
Cheers,
 
Kun-Mao
 
PS. I won't pass Yang-Ho's draft around because it's still under the reviewing process. Yang-Ho, welcome back!
 
 
Talk #1: December 12, 2008 (CSIE seminar time)


      Assembly for Double-Ended Short-Read Sequencing Technologies

                   Steven Skiena

              Department of Computer Science
              State University of New York
              Stony Brook, NY 11794-4400 USA
              http://www.cs.sunysb.edu/~skiena


Next-generation sequencing technologies developed by Solexa/Illumina,
Agencourt/ABI, and Helicos Biosciences yield sequencing reads which
are dramatically shorter (20-40 bases) but vastly cheaper than
than those produced by the previous generation of sequencing machines.
We study the space of read length, sequencing error rate, and coverage
that lies well outside conventional assumptions to determine the
technological/economic parameters where de novo sequencing
will be achievable with these new technologies.

We prove that genome assembly on bacterial and human sequences is
possible using astonishingly short reads, given sufficiently high coverage.
In particular, we demonstrate that we can assemble bacterial genomes using
data from ABI's recently-launched SoLID sequencer.

(Joint work with J. Chen and S. Hossain.)
 
Biography: Steven Skiena is Professor of Computer Science at SUNY Stony Brook.
His research interests include the design of graph, string, and geometric
algorithms, and their applications (particularly to biology).  He is the
author of four books, including "The Algorithm Design Manual" and
"Calculated Bets: Computers, Gambling, and Mathematical Modeling to Win".
He is recipient of the ONR Young Investigator Award and the
IEEE Computer Science and Engineering Undergraduate Teaching Award.
 

 

Talk #2: 1:20pm December 10, 2008; Room R107

ReSEQ: Mapping Reads with Statistical Evaluation of Quality Scores for Genome Resequencing
 
Yangho Chen
Program in Computational Biology and Bioinformatics
University of Southern California
 
We have developed ReSEQ, a program which efficiently maps millions of short reads from a Solexa High-throughput Sequencer onto a reference genome or transcriptome. ReSEQ iteratively maps, weights, and calls SNPs for reads to maximize the accuracy in estimation of the target genome or the expression levels. The mapping algorithm in ReSEQ uses optimal single spaced seeds and presents an integer programming method to generate paired seeds. The spaced seeds allow efficient reporting of all alignments within three substitutions or Insertions/deletions of length less than three base pairs. Compared to other existing methods, the single spaced seed increases speed and sensitivity while requiring only one-third of the memory used by existing programs. This design makes it possible to load the hash table for the whole human genome or transcriptome to memory on a server or desktop respectively. For each alignment, ReSEQ calculates statistical significance using a dynamic programming algorithm on a Markov model learned from the background sequences. ReSEQ estimates the rates at which the machines create sequencing errors with an EM algorithm. Reads which map significantly to multiple locations are weighted according to the probability that each location is responsible for the read, balancing the goal of high coverage and necessity of statistical rigor. ReSEQ uses a likelihood ratio test based on quality scores to distinguish sequencing errors from SNPs, iteratively re-mapping reads to provide the best estimate for the target genome from which reads were sequenced. Test results show that iterative re-mapping and re-estimating the target genome sequence significantly increases the number of mapped reads and called SNPs.

(Joint work with Tade Souaiaia and Ting Chen.)

Biography: Yangho Chen is currently a Ph.D. candidate at USC. He received the B.S. degree in computer science and information engineering from National Taiwan University in 2003. His research interests include algorithms and bioinformatics.