922 U3710: AMMAI - ADVANCED TOPICS IN MULTIMEDIA ANALYSIS AND INDEXING
(高等多媒體資訊分析與檢索)
Spring 2011 (14:20 ~ 17:20, Thursday, CSIE RM#542)
Brief Introduction
This course focuses on recent development of machine learning techniques that are promising for solving practical problems in video indexing and audio-visual content analysis. The goal is for students to get familiar with the state of the art, learn how to formulate and solve practical video indexing/analysis problems, and acquire hands-on experience through actual experiments. The course will include some topics in depth such as:
- Advanced image features (e.g., local features, shapes, etc.)
- Hashing techniques
- Sparse coding
- Automatic image and video annotation
- Large-scale concept ontology for multimedia
- Automatic visual training data acquisition
- Manifold learning
- Ranking methods for search and semantic concept detection
- Large-scale image/video duplicate detection
- Distributed computation (e.g., MapReduce) for large-scale image/video analysis and retrieval
- Practical issues for crawling, indexing, and retrieval in large-scale visual search engines
- Graphical learning models:
- MRF, HMM
- Variational methods, loopy belief propagation, and Monte Carlo methods for inference
Course Goals :
- Extending breadths and depths for essential technical components for MMAI in feature representations and learning.
- Gaining practical experiences through assignments and experiments.
- Practicing paper critiques, summarization, and presentations
Prerequisites: Background in image processing (or signal processing related courses), probability, and linear algebra. Experience with machine learning or statistical pattern recognition will be useful but not required.
Course Format: Lectures by the instructor and paper critiques by students. Each one is expected to assign one topic (or paper).
Lecturer: Winston Hsu (office: R512, CSIE Building)
TA: Liang-Chi Hsieh
Time: 14:20 ~ 17:20, Thursday; No lectures on March 31 (NII Shanon Meeting), May 25 (ICASSP 2011)
Location: RM#542, CSIE Building
Mailing List: All the course announcements will be sent though the mailing list, please do subscribe for the class.
https://cmlmail.csie.ntu.edu.tw/mailman/listinfo/ammai and browse the discussion archives.
Assessment:
- Assignments : 20% (two experiments)
- Presentations: 30%
- Course participation & paper summarization: 20%
- Final Project: 30%
Textbook: NO. We will cover some active research areas not included in any mature textbooks. Nevertheless, we will provide rich papers and reference books.
Students and Reading Blogs
Project Groups
Course Outline
Lecture 01 - Introduction (02/24/11, Thursday)
- Introduction for the course and topics
- Basic paper reading, critique, and presentation techniques
- Readings:
- "How to Read a Paper," Keshav, ACM SIGCOMM Computer Communication Review 2007. [m - must] (no summary)
- "How to give a good research talk," Jones et. al. [m] (no summary)
- "Writing Technical Articles," Henning Schulzrinne. [o - optional]
- "Present Like Steve Jobs" [video]
- "Image Retrieval: Ideas, Influences, and Trends of the New Age," Datta, 2008 (comprehensive and long) [o]
Lecture 02 - Interesting Points and Local Descriptors (03/03/11, Thursday)
- Presenters
- winston hsu - interesting points and local descriptors

- winston hsu - visual words and advanced issues

- Readings:
- "Distinctive Image Features from Scale-Invariant Keypoints," Lowe, IJCV, 2004. [m]
- "Efficient visual search of videos cast as text retrieval," J. Sivic, and A. Zisserman, IEEE TPAMI, 2009. [m]
- "A Performance Evaluation of Local Descriptors," Mikolajczyk, PAMI 2005. [o]
- "A Comparison of Affine Region Detectors," Mikolajczyk, IJCV, 2004. [o]
- "Video Google: A Text Retrieval Approach to Object Matching in Videos," J. Sivic, and A. Zisserman, ICCV, 2003. [o]
- "Scale & Affine Invariant Interest Point Detectors," Mikolajczyk, IJCV, 2004.[o]
- "ContextSeer: Context Search and Recommendation at Query Time for Shared Consumer Photos," Yi-Hsuan Yang, Po-Tun Wu, Ching-Wei Lee, Kuan-Hung Lin, Winston H. Hsu, ACM Multimedia 2008. [o]
- Supplemental materials
Lecture 03 - Advanced Topics for Large-Scale Image Retrieval (03/10/11, Thursday)
- Presenters
- winston hsu

- Readings:
- Herve Jegou, et al., Aggregating local descriptors into a compact image representation, Proc. IEEE CVPR'10 [m]
- F. Perronnin, et al., Large-scale image retrieval with compressed Fisher vectors. In CVPR, June 2010. [o]
- J. Philbin, et al., Descriptor Learning for Efficient Retrieval, European Conference on Computer Vision, 2010 [o]
Lecture 04 - Hashing and Semantic-Preserving Hashing (03/17/11, Thursday)
- Presenters: kuonini

- Readings:
- R. Salakhutdinov et al, “Semantic Hashing”, SIGIR, 2007.
- Y. Weiss et al, “Spectral Hashing”, NIPS, 2008.
- B. Kulis et al, “Kernelized Locality-Sensitive Hashing for Scalable Image Search,” ICCV, 2009.
- J. Wang et al, "Semi-Supervised Hashing for Scalable Image Retrieval," CVPR, 2010.
- A. Torralba, “Small Codes and Large Image Databases for Recognition,” CVPR, 2008.
- T. Huang et al, “Mediaprinting: Identifying Multimedia Content for Digital Rights Management,” IEEE Computer, 2010.
- Y.-G. Jiang et al, "Lost in Binarization: Query-Adaptive Ranking for Similar Image Search with Compact Codes," ICMR, 2011. [m]
Lecture 05/06 - Latent Semantic Analysis (I) (03/24/11, Thursday)
- Presenters
- winston - pLSA

- winston - Information Bottleneck Principle

- winston - LDA
- Assignment #1: pLSA
- Readings:
- "Probabilistic latent semantic indexing," T. Hofmann, SIGIR, 1999. [m]
- "Latent Dirichlet allocation," D. Blei, A. Ng, and M. Jordan. . Journal of Machine Learning Research, 3:993–1022, January 2003 [m]
- "Unsupervised Learning by Probabilistic Latent Semantic Analysis," T Hofmann - Machine Learning, 2001. (an extended version) [o]
- "Document Clustering using Word Clusters via the Information Bottleneck Method," Noam Slonim and Naftali Tishby, SIGIR 2000. [o]
- "Indexing by Latent Semantic Analysis," Deerwester, 1990. [o]
- "Image retrieval on large-scale image databases," Eva Horster, Rainer Lienhart, Malcolm Slaney, CIVR 2007. [o]
- "A Bayesian Hierarchical Model for Learning Natural Scene Categories," Fei Fei Li, CVPR 2005.
Lecture 07 - Manifold Methods (04/07/11, Thursday)
- Presenters
- Wei-Lun Chao

- Readings:
- "Nonlinear dimensionality reduction by locally linear embedding," Roweis & Saul, Science, 2000. [m]
- "Graph Embedding and Extensions: A General Framework for Dimensionality Reduction," Shuicheng Yan et al., PAMI 2007.
- "An Introduction to Locally Linear Embedding," Saul & Roweis. [o]
- "The Manifold Ways of Perception," Seung & Lee, Science, 2000. [o]
- "Linear Discriminant Analysis in Document Classification," Torkkola. [o]
- "Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds," LK Saul, ST Roweis - Journal of Machine Learning Research, 2004. (an extended version for LLE) [o]
- Supplemental materials
Lecture 08 - Latent Semantic Analysis (II) (04/14/11, Thursday)
- Presenters
- winston - LDA

- Readings:
- "Latent Dirichlet allocation," D. Blei, A. Ng, and M. Jordan. . Journal of Machine Learning Research, 3:993–1022, January 2003 [m]
- "Efficient Indexing for Large Scale Visual Search," Xiao Zhang et al., ICCV 2009. [o]
Lecture 09 - Learning to Rank (I) - RankSVM + AdaRank (04/21/11, Thursday)
- Presenters: winston

- Readings:
- "Support vector learning for ordinal regression," R. Herbrich, ICANN, 1999. [m]
- "Optimizing search engines using clickthrough data," T. Joachims, ACM SIGKDD, 2002. [o; helpful for understanding RankSVM]
- "AdaRank: a boosting algorithm for information retrieval," Jun Xu, Hang Li, SIGIR 2007.
Lecture 10 - Learning to Rank (II) - ListNet + Reranking (04/28/11, Thursday)
- Presenters: winston

- Readings:
- "Learning to rank: from pairwise approach to listwise approach," Cao, ICML, 2007. [m]
- "Feature Selection for Ranking," Xiubo Geng, Tie-Yan Liu, et al. SIGIR 2007
- "Reranking Methods for Visual Search," Hsu, IEEE Multimedia, 2007. [o]
Lecture 11 - Mining People Attributes and Activities (05/05/11, Thursday)
- Presenters:
- Readings:
- N. Kumar, et al., “Describable Visual Attributes for Face Verification and Image Search,” PAMI, 2011. [m]
- N. Kumar, et al., “Attribute and simile classifiers for face verification,” CVPR 2009.
- N. Kumar, et al., “FaceTracer: A Search Engine for Large Collections of Images with Faces,” ECCV 2008.
- R. Garg, et al., “Where’s Waldo: Matching People in Images of Crowds,” CVPR 2011. [m]
- B. Siddiquie, et al., “Image Ranking and Retrieval based on Multi-Attribute Queries,” CVPR 2011.
Lecture 12 - High Performance Analytics (I): Current Solutions (05/12/11, Thursday)
- Presenters: winston

- Readings:
- A. Gates et al., “Building a high-level dataflow system on top of Map-Reduce: the Pig experience,” VLDB 2009. [m]
- S. Ghemawat and J. Dean, “MapReduce: Simplified Data Processing on Large Clusters,” Usenix SDI, 2004.
- J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” Communications of the ACM, 2008.
- B. White, et al., “Web-scale computer vision using MapReduce for multimedia data mining,” MDMKDD 2010.
Lecture 13 - High Performance Analytics (II): Algorithms for Image/Video Analysis (05/19/11, Thursday)
- Presenters: winston

- Readings:
- U. Kang, et al., “PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations,” ICDM 2009. [m]
- C. Chu et al., “Map-reduce for machine learning on multicore,” NIPS 2007.
- Z. Sun, et al., “Large-Scale Matrix Factorization Using MapReduce,” ICDMW 2010
- U. Kang, et al., “Inference of Beliefs on Billion-Scale Graphs,” KDD 2010.
Lecture 14 - Object Localization (06/02/11, Thursday)
- Presenters: A-da
, Chieh-Chi
- Readings:
- Efficient Algorithms for Subwindow Search in Object Detection and Localization, CVPR 2009.
- Fast concurrent object localization and recognition, CVPR 2009. [m] (05/20)
- Detecting Objects in Large Image Collections and Videos by Efficient Subimage Retrieval, Christoph H. Lampert, ICCV 2009.
Lecture 15 - Sparse Coding (06/09/11, Thursday)
- Presenters: winston, Sammy

- Readings:
- Elad and Aharon. Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries. Image Processing, IEEE Transactions on Image Processing (2006) vol. 15 (12) pp. 3736 - 3745.
- Mairal et al. Online dictionary learning for sparse coding. ICML 2009. [m]
- Shenghua Gao et al. Local features are not lonely – Laplacian sparse coding for image classification. CVPR 2010.
- Jianchao Yang et al. Linear spatial pyramid matching using sparse coding for image classification. CVPR 2009.
Lecture 16 - Project Presentation (06/16/11, Thursday)
Tips for Student Presenters
Generally, we had included the *must* papers and optional ones in the reading lists. The goal for the presentation is to help the audiences and presenters understand the breadth and depths in these problems. The presentation time for each topic is around 50 ~ 60 min. We can adjust the duration if necessary.
Presenters can emphasize more on the "must" papers in depth, which are highly cited correspondingly. However, we expect presenters to mention the breadth for the problems as well. Please discuss at side with other related works and their comparisons, which can be found in the optional papers. Students are encouraged to use other materials that are useful for the explanations. Meanwhile, an introduction with sample codes and real examples is the best way for the audiences to comprehend what the details are. I would encourage preparing in advance if applicable.
The guideline for presentation might be a help for students as well.
Please chat with the lecturer one week before the presentation.
Course Material
Books:
- [Gold'99] Speech and Audio Signal Processing: Processing and Perception of Speech and Music, by Ben
Gold and Nelson Morgan, Wiley, 1999
- [Bishop'06] Pattern Recognition and Machine Learning, by Christopher M. Bishop, Springer, 2006
- [Alpaydin'04] Introduction to Machine Learning, by Ethem Alpaydin, The MIT Press, 2004
- [Duda'02] Pattern Classification, by Richard Duda, et. al., 2nd Edition, Wiley-Interscience, 2000.