922 U3570: MULTIMEDIA ANALYSIS AND INDEXING -- MMAI (多媒體資訊分析與檢索)
Fall 2009 (14:20 ~ 17:20, Tuesday, CSIE RM#111)
Brief Introduction
With recent advances in communications, computers, and storage capacities, multimedia streams (i.e., videos, photos, music) are becoming increasingly important information sources. To deal with such enormous and diverse information, there arise challenging theoretical problems and strong industrial needs. We will preliminarily address such issues in this course. Students in the course will gain practical experiences through intensive hand-on homeworks. The topics include the following:
- Machine learning techniques such as graphical models, discriminative models, clustering approaches, etc.
- Multimedia (video/photo/music) feature representations
- Construction of high-level indices
- Content analysis and object recognition
- Multimedia indexing and retrieval
- Multimedia data mining
- Information exploitation in social media
- Summarization, personalization, and visualization of large-scale multimedia databases
- Standards and applications in video, photo, and medical image databases
- Benchmarks and evaluation metrics
Prerequisites: Background in image processing (or signal processing related courses), probability, and linear algebra. Experience with machine learning or statistical pattern recognition will be useful but not required.
Lecturer: Winston Hsu (office: R512, CSIE Building)
TA: Kuang-Ting Chen, ktchen {at} cmlab.csie.ntu.edu.tw, R501 (office hour: 13:30 ~ 15:30, Thursday), TA's Q&A
Time: 14:20 ~ 17:10, Tuesday
Location: RM 111, CSIE Building
Mailing List:: https://cmlmail.csie.ntu.edu.tw/mailman/listinfo/mmai
Assessment:
- Assignments : 30%
- video shot detection
- content-based image retrieval (CBIR)
- video concept detection/classification
- Midterm Exam.: 20%
- Final Project: 50%
Textbook: NO. We will cover some active research areas not included in any mature textbooks. Nevertheless, we will provide rich papers and reference books.
Course Outline
Lecture 01 - Introduction (09/15/09, Tuesday)
(a short version
)
- Introduction of the course and topics to be covered
- Demo of example systems for audio/visual analysis/indexing/retrieval
- Logistic issues regarding grading, homework, references, etc.
- MATLAB introduction by TA
- Reading:
- Peter Lyman, et al., "How Much Information? 2003," University of California at Berkeley.
- Dan Ellis, "Extracting Information from Music Audio," Communications of the ACM, special issue on Music Information Retrieval, vol. 49, no. 8, pp.32-37, August 2006.
- Howard D. Wactlar, Alexander G. Hauptmann, Michael G. Christel, Ricky A. Houghton, Andreas M. Olligschlaeger, "Complementary Video and Audio Analysis for Broadcast News Archives," Communications of the ACM, 43(2):42-47.
- Shih-Fu Chang, R. Manmatha, Tat-Seng Chua. "Combining Text and Audio-Visual Features in Video Indexing." In IEEE ICASSP 2005, Philadelphia, PA, March 2005.
- Shih-Fu Chang, Wei-Ying Ma, Arnold Smeulders, "Recent Advances and Challenges of Semantic Image/Video Search," In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hawaii, USA, April 2007.
Lecture 02/03 - Video/Image Standards and Syntax Analysis(09/22, 29/09, Tuesday)
- Introduction of video/image standards
- Video syntax analysis
- HW#1 - video shot detection
& videos (due: noon, Oct. 6)
- Reading:
- D. Le Gall, "MPEG: A Video Compression Standard for Multimedia Applications," Communications of ACM, April 1991, Vol 34, No. 4, pp. 46-58.
- J.S. Boreczky, L.A. Rowe, "Comparison of video shot boundary detection techniques," Proc of SPIE- Storage and Retrieval for Still Image and Video Databases IV, Vol. 2670, San Diego, 1996. (overview paper, must-read)
- I. Koprinska, S. Carrato, "Temporal video segmentation: a survey," Signal Processing: Image Communication, vol. 16, pp. 477--500, 2001. (overview paper, must-read) (Sec. 3.2 & 3.3, skipped)
- J.S. Boreczky, L.D. Wilcox, "A hidden Markov model framework for video segmentation using audio and image features," in Proc. Int. Conf. Acoustics, Speech, and Signal. Processing (ICASSP-98), Vol. 6, Seattle, WA, May 1998.
- S. Uchihashi and J. Foote, "Summarizing Video Using a Shot Importance Measure and a Frame-Packing Algorithm," In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, vol. 6, pp. 3041-3044, 1999.
Lecture 04 - Color for Content-Based Image Retrieval (10/06/09, Tuesday) 
- Overview of similarity-based image/video retrieval
- Color features and metrics for CBIR
- Reading:
- Chapter 11 of book [Castelli'01] (overview paper, must-read)
- Y. Rui, T.S. Huang, and S.-F. Chang, "Image Retrieval: Current Techniques, Promising Directions, and Open Issues," J. Visual Comm. and Image Representation, vol. 10, no. 1, pp. 39-62, 1999.(overview paper, must-read)
- M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovicand D. Steele, and P. Yanker. Query by image and video content: The QBIC system. In IEEE Computer, volume 38, pages 23-31, 1995.
- John R. Smith, Shih-Fu Chang, "VisualSEEk: a Fully Automated Content-Based Image Query System," In ACM Multimedia, Boston, MA, November 1996.
- Kieran McDonald, Alan F. Smeaton, "A Comparison of Score, Rank and Probability-Based Fusion Methods for Video Shot Retrieval," CIVR 2005. (multimodal fusion)
Lecture 05/07 - Texture and Shape for CBlR (10/13, 10/27/09, Tuesday) 
- Texture and shape features in statistical & spectral domains
- HW#2 - Content-based Image Retrieval (due: Nov. 3) -
[
, dataset]
- Reading:
- "Texture features for browsing and retrieval of image data," B. S. Manjunath and W.Y. Ma, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.18, no.8, pp.837-42, Aug 1996. (must-read)
- "Filtering for Texture Classification: A Comparative Study," Randen, et. al., IEEE Trans. Image Processing, 1999.
- "Textural Features Corresponding to Visual Perception," Hideyuki Tamura. Shunji Mori. Takashi Yamawaki. IEEE Transactions on Systems, Man, and Cybernetics, No. 6, June 1978
- "Benchmarking of image features for content-based retrieval," Wei-Ying Ma and Hong Jiang Zhang, Record of the 32nd Asilomar Conf. on Signals, Systems & Computers, 1998, Vol 1.
- "Color and Texture Descriptors," B. S. Manjunath, Jens-Rainer Ohm, Vinod V. Vasudevan, Akio Yamada, IEEE Transactions on Circuits and Systems for Video Technology, Vol 11, No. 6, June 2001.
- "MPEG-7 visual shape descriptors," Miroslaw Bober, IEEE Transactions on Circuits and Systems for Video Technology, Vol 11, No. 6, June 2001.
Lecture 6 - Introduction of Audio/Music (10/20/09, Tuesday)
- Guest speaker: Eric Yang
- Audio applications (event detection, retrieval, emotion)
- Feature representations and processing techniques
- Sampling & windowing
- The Discrete-Time Fourier Transform (DTFT)
- Linear prediction/linear predictive coding (LPC)
- Mel frequency cepstral coefficients (MFCC)
- Other features
- Reading:
- "Musical genre classification of audio signals," Tzanetakis, IEEE Trans. Speech and Audio Processing, 2002. (must-read)
- "Content-Based Music Information Retrieval- Current Directions and Future Challenges," Casey et. al., Proceedings of IEEE, 2008. (must-read)
- "Semantic Context Detection Using Audio Event Fusion," Chu, Journal on Advances in Signal Processing, 2006.
Lecture 08/09 - Feature Reduction and Multidimensional Indexing (11/03/09, 11/10/09, Tuesday) (
+
)
- The curse of dimensionality
- Feature reduction for high-dimensional data
- Multidimensional indexing
- SVD demo (sample codes + jpeg image)
- Reading:
- "Multidimensional Indexing Structures for Content-based Retrieval," Vittorio Castelli, IBM Research Report, 2001. (overview paper, must-read)
- "Eigenfaces for recognition," M Turk, A Pentland - Journal of Cognitive Neuroscience, 1991. (must-read)
- "Matrices, vector spaces, and information retrieval," Michael W. Berry, Zlatko Drmavc, and Elizabeth R. Jessup, SIAM Review, 41(2):335-362, June 1999. (SVD related)
- "Multidimensional access methods," Gaede, V., Gunther, O., Journal ACM Comp. Surveys, Vol 30, Num 2, 1998. (overview paper)
- Malcolm Slaney and Michael Casey, "Locality-Sensitive Hashing for Finding Nearest Neighbors. IEEE Signal Processing Magazine, 2008. (must-read)
Alexandr Andoni and Piotr Indyk, "Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions," Communications of the ACM, 2008. (good paper)
- S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman and A. Wu, "An optimal algorithm for approximate nearest neighbor searching," Journal of the ACM, 1998.
- Giang P. Nguyen and Marcel Worring. Interactive access to large image collections using similarity-based visualization. Journal of Visual Languages and Computing, 19(2):203-224, April 2008.
Lecture 10 - Midterm (11/17/09, Tuesday)
- Coverage for the midterm: Lecture 1 ~ ??.
- Problems form lecture slides and *must-read* paper; having high-level problems and some calculations.
- Open book but no laptops or mobile devices
in the classroom.
- Soft reminder -- Save our nature. It's OK not bringing printed lecture slides.
Lecture 11 - Hash-based Indexing (11/24/09, Tuesday) 
- Tree-based indexing, KD-tree
- Hash-based indexing, hashing functions for LSH
- Reading:
- Malcolm Slaney and Michael Casey, "Locality-Sensitive Hashing for Finding Nearest Neighbors. IEEE Signal Processing Magazine, 2008. (must-read)
- Alexandr Andoni and Piotr Indyk, "Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions," Communications of the ACM, 2008. (good paper)
- M. Datar, et al. Locality-sensitive hashing scheme based on p-stable distributions. SoCG 2004.
- P. Indyk et al. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. STOC 1998
Lecture 12/13 - Overview for Concept (Semantic)-based Image/Video Analysis and Retrieval (12/1/09, 12/08/09,Tuesday)(
+
)
- Brieifing researches regarding concept design, extraction, and search.
- Part of the tutorials covered in top conferences (e.g., ACM Multimedia and SIGIR 2008, Multimedia 2009, and ICASSP 2009).
- Individual project meetings (15 min/group)
- 1pm ~ 5pm, Friday, December 11
- 10am ~ 12pm, Monday, December 14
- Reading:
- M. R. Naphade, J. R. Smith, J. Tesic, S. F. Chang, W. Hsu, L. Kennedy, A. Hauptmann and J. Curtis, "Large-scale concept ontology for multimedia," IEEE MultiMedia Magazine, 13 (3), Sep. 2006. (must-read)
- Lexing Xie, Rong Yan, "Extracting Semantics from Multimedia Content: Challenges and Solutions," In Multimedia Content Analysis: Theory and Applications, A. Divakaran Ed., Springer, 2008.(recommended)
- M. Naphade, J. R. Smith, "On the Detection of Semantic Concepts at TRECVID," ACM Multimedia 2004.
- A. G. Hauptmann, R. Yan, W.H. Lin, M. Christel and H. Wactlar, "Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News." In IEEE Transactions on Multimedia, Aug. 2007.
- Timo Volkmer, John R. Smith, Apostol (Paul) Natsev, "A web-based system for collaborative annotation of large image and video collections: an evaluation and user study," ACM Multimedia 2005.
Lecture 14 - Video Structure/Event Discovery (12/15/08, Tuesday) (
+
)
- Generative models -- HMM
- Video Structure discovery
- Broadcast news videos
- Maximum Entropy method
- HMM method
- Sports highlights
- HW#3 - paper reading, summary, and critiques for [Xie02Structure] and [Hsu03Statistical]; due Dec. 29.
- Reading:
- L. Xie, S.-F. Chang, A. Divakaran, and H. Sun, "Structure analysis of soccer video with hidden markov models," ICASSP, May 2002.(must-read)
- Xinguo Yu , Changsheng Xu , Hon Wai Leong , Qi Tian , Qing Tang , Kong Wah Wan, "Trajectory-based ball detection and tracking with applications to semantic analysis of broadcast soccer video," ACM Multimedia, 2003.
- L. Chaisorn, T.-S Chua and C.-H. Lee, "The segmentation of news video into story units," ICME 2002.
- Winston H. Hsu and Shih-Fu Chang, "A Statistical Framework for Fusing Mid-level Perceptual Features in News Story Segmentation," ICME, Baltimore, 2003. (must-read)
- Yoshua Bengio, "Markovian Models for Sequential Data," 1999
- Jeff A. BILMES, "What HMMs Can Do," IEICE Transactions on Information and Systems, 2006.
- Bayes Net Toolbox for MATLAB (for graphical models)
Lecture 15 - Clustering Approaches for Visual Document Organization (12/22/09, Tuesday)
- Conventional Clustering
- K-means
- GMM
- Hierarchical clustering
- Video clip similarity
- Eearth Mover's Distance
- Video Signature
- News story threading (tracking/clustering)
- Reading:
- Chapter 10 - Unsupervised Learing and Clustering of [Duda'02]
- A. K. Jain, M.N. Murthy and P.J. Flynn, Data Clustering: A Review, ACM Computing Reviews, Nov 1999.
- Greg Hamerly, Charles Elkan. Learning the K in K-Means, NIPS 2003.
- Yuxin Peng, Chong-Wah Ngo, "EMD-Based Video Clip Retrieval by Many-to-Many Matching," CIVR 2005: 71-81.
- Cheung, S.-C. and A. Zakhor, "Estimation of web video multiplicity," in Proceedings of the SPIE -- Internet Imaging, Volume 3964, pp. 34-36, 2000.
- Winston H. Hsu, Shih-Fu Chang: Topic Tracking Across Broadcast News Videos with Visual Duplicates and Semantic Concepts. ICIP 2006: 141-144 (must-read)
Lecture 16 - Video Retrieval (12/29/09, Tuesday) 
- Concept search (search over large-scale concept ontology)
- Query by multiple examples
- Video threading and retrieval
- Reading:
- Apostol Natsev, Alexander Haubold, Jelena Tesic, Lexing Xie, Rong Yan, "Semantic concept-based query expansion and re-ranking for multimedia retrieval," ACM Multimedia 2007. (recommended)
- A. Haubold and A. Natsev, "Semantic multimedia retrieval using lexical query expansion and model-based reranking," In International Conference on Multimedia and Expo(ICME), 2006.
- Lyndon Kennedy, Shih-Fu Chang, "A Reranking Approach for Context-based Concept Fusion in Video Indexing and Retrieval," ACM International Conference on Image and Video Retrieval, Amsterdam, Netherlands, July 2007.
- "Video Search Reranking through Random Walk over Document-Level Context Graph," Winston H. Hsu, Lyndon Kennedy, and Shih-Fu Chang, ACM Multimedia 2007, Augsburg, Germany, September 23-29, 2007.
Lecture 17 - Project Presentation (01/05/10, Tuesday)
- Presentation: 10 min/group, 01/06/2009, starting earlier at 1:30pm.
- Final report: 01/18/2009
- Presentation list

- Bring your mugs.
Course Projects
Purpose: posting student projects in the course and recruiting project members
PAST COURSE PROJECTS 
Course Material
Books:
- [Castelli'01] Image Databases: Search and Retrieval of Digital Imagery , by Vittorio Castelli and Lawrence D. Bergman, Wiley-Interscience, 2001
- [Gold'99] Speech and Audio Signal Processing: Processing and Perception of Speech and Music, by Ben
Gold and Nelson Morgan, Wiley, 1999
- [Bishop'06] Pattern Recognition and Machine Learning, by Christopher M. Bishop, Springer, 2006
- [Alpaydin'04] Introduction to Machine Learning, by Ethem Alpaydin, The MIT Press, 2004
- [Duda'02] Pattern Classification, by Richard Duda, et. al., 2nd Edition, Wiley-Interscience, 2000.
Papers:
- TRECVID
- The TRECVID page contain rich information regarding research activities, past workshop papers, and relevent bibliography of peer-reviewed TRECVID work published in other venues.
- Alexander G. Hauptmann and Michael G. Christel, “Successful approaches in the TREC Video Retrieval Evaluations,” in ACM Multimedia 2004, New York, 2004.
- Milind R. Naphade and John R. Smith, “On the detection of semantic concepts at TRECVID,” in ACM Multimedia, New York, 2004, pp. 660–667.
- MISC
- S. Antani, R. Kasturi, and R. Jain. "A survey on the use of pattern recognition methods for abstraction, indexing, and retrieval of images and video," Pattern Recognition, 35(4):945—965, 2002.
- Y. Rui, T.S. Huang, and S.-F. Chang, "Image Retrieval: Current Techniques, Promising Directions, and Open Issues," J. Visual Comm. and Image Representation, vol. 10, no. 1, pp. 39-62, 1999.