[2017-03-10] Dr. Meng Tsung Tsai, Rutgers University, " On the Analysis of Massive Data Sets: Sampling, Sparse Recovery, and Communication Complexity”
Title: On the Analysis of Massive Data Sets: Sampling, Sparse Recovery, and Communication Complexity
Date: 2017-03-10 2:20pm-3:30pm
Location: R102, CSIE
Speaker: Dr. Meng Tsung Tsai, Rutgers University
Hosted by: Prof. Yuh-Dauh Lyuu
Physical memory is small. When data sets do not fit into memory, RAM algorithms cannot be used to extract valuable information from unstructured data efficiently. One frequentlyused approach to resolve the issue is to draw some random samples for analysis rather than process the entire set of data. Such sampling technique is space-efficient and is usually sufficient to obtain useful information. For most cases, data sets are not static, i.e. new data can be added and aged data can be removed. These dynamic updates make the sampling more challenging. To tackle the challenges, we appeal to sparse recovery and demonstrate this technique by showing some applications in graph streaming, such as k-connectivity, degeneracy, etc. Sampling techniques are useful. However, for some difficult problems, it is impossible to give a sufficiently good solution using little memory. We show how to prove the impossibility by communication complexity.
Meng-Tsung Tsai is currently a Ph.D. student in Computer Science at Rutgers University. He received his B.S. (resp. M.S.) degree in Computer Science and Information Engineering from National Taiwan University in 2006 (resp. 2008). His research focuses on spaceefficient algorithms, I/O-efficient databases, and fundamental techniques for processing huge amounts of unstructured data.