922 U3710: AMMAI - ADVANCED TOPICS IN MULTIMEDIA ANALYSIS AND INDEXING
(高等多媒體資訊分析與檢索)
Spring 2010 (14:20 ~ 17:20, Thursday, CSIE RM#524)
Brief Introduction
This course focuses on recent development of machine learning techniques that are promising for solving practical problems in video indexing and audio-visual content analysis. The goal is for students to get familiar with the state of the art, learn how to formulate and solve practical video indexing/analysis problems, and acquire hands-on experience through actual experiments. The course will include some topics in depth such as:
- Graphical learning models:
- MRF, HMM
- Variational methods, loopy belief propagation, and Monte Carlo methods for inference
- Advanced image features (e.g., local features, shapes, etc.)
- Automatic image and video annotation
- Large-scale concept ontology for multimedia
- Automatic visual training data acquisition
- Manifold learning
- Ranking methods for search and semantic concept detection
- Large-scale image/video duplicate detection
- Distributed computation (e.g., MapReduce) for large-scale image/video analysis and retrieval
- Practical issues for crawling, indexing, and retrieval in large-scale visual search engines
Course Goals :
- Extending breadths and depths for essential technical components for MMAI in feature representations and learning.
- Gaining practical experiences through assignments and experiments.
- Practicing paper critiques, summarization, and presentations
Prerequisites: Background in image processing (or signal processing related courses), probability, and linear algebra. Experience with machine learning or statistical pattern recognition will be useful but not required.
Course Format: The first half will be in a lecture format by the lecturer. The latter half will be paper critiques by students. Each one is expected to assign one topic (or paper).
Lecturer: Winston Hsu (office: R512, CSIE Building)
TA: TBA
Time: 14:20 ~ 17:20, Thursday
Location: RM#524, CSIE Building
Mailing List: All the course announcements will be sent though the mailing list, please do subscribe for the class.
https://cmlmail.csie.ntu.edu.tw/mailman/listinfo/ammai and browse the discussion archives.
Assessment:
- Assignments : 30%
- Presentations: 30%
- Paper critiques & summaries : 30%
- Course participation: 10%
Textbook: NO. We will cover some active research areas not included in any mature textbooks. Nevertheless, we will provide rich papers and reference books.
Students and Reading Blogs
Course Outline
Lecture 01 - Introduction (02/18/09, Wednesday)
- Introduction for the course and topics
- Readings:
- "Image Retrieval: Ideas, Influences, and Trends of the New Age," Datta, 2008 (comprehensive and long, summarized in the next week)
Lecture 02 - MMAI Overview and Preparations (02/25/09, Wednesday)
- MMAI recap by TA
- Video feature representations, shot segmentation
- Image feature representations, content-based image retrieval
- Basic mathematics tools
Probability 101, Entropy, Mutual Information, etc
- Readings:
- "How to Read a Paper," Keshav, ACM SIGCOMM Computer Communication Review 2007. [m - must]
- "How to give a good research talk," Jones et. al. [m]
- "Image Retrieval: Ideas, Influences, and Trends of the New Age," Datta, 2008 (comprehensive and long) [m]
- "Writing Technical Articles," Henning Schulzrinne. [o - optional]
Tips for Student Presenters
Generally, we had included the *must* papers and optional ones in the reading lists. The goal for the presentation is to help the audiences and presenters understand the breadth and depths in these problems. The presentation time for each topic is around 50 ~ 60 min. We can adjust the duration if necessary.
Presenters can emphasize more on the "must" papers in depth, which are highly cited correspondingly. However, we expect presenters to mention the breadth for the problems as well. Please discuss at side with other related works and their comparisons, which can be found in the optional papers. Students are encouraged to use other materials that are useful for the explanations. Meanwhile, an introduction with sample codes and real examples is the best way for the audiences to comprehend what the details are. I would encourage preparing in advance if applicable.
The guideline for presentation might be a help for students as well.
Please chat with the lecturer one week before the presentation.
Course Material
Books:
- [Gold'99] Speech and Audio Signal Processing: Processing and Perception of Speech and Music, by Ben
Gold and Nelson Morgan, Wiley, 1999
- [Bishop'06] Pattern Recognition and Machine Learning, by Christopher M. Bishop, Springer, 2006
- [Alpaydin'04] Introduction to Machine Learning, by Ethem Alpaydin, The MIT Press, 2004
- [Duda'02] Pattern Classification, by Richard Duda, et. al., 2nd Edition, Wiley-Interscience, 2000.