922 U3570: MULTIMEDIA ANALYSIS AND INDEXING -- MMAI
(多媒體資訊分析與檢索)
Fall 2011 (14:20 ~ 17:20, Tuesday, CSIE RM#111)
Brief Introduction
With recent advances in communications, computers, and storage
capacities, multimedia streams (i.e., videos, photos, music) are
becoming increasingly important information sources. To deal with such
enormous and diverse information, there arise challenging theoretical
problems and strong industrial needs. We will preliminarily address such
issues in this course. Students in the course will
gain practical experiences through hand-on homeworks and the final
project. The topics include the following:
- Machine learning techniques such as graphical models, discriminative
models, clustering approaches, etc.
- Multimedia (video/photo/music) feature representations
- Construction of high-level indices
- Content analysis and object recognition
- Multimedia indexing and retrieval
- Multimedia data mining
- Information exploitation in social media
- Summarization, personalization, and visualization of large-scale
multimedia databases
- Standards and applications in video, photo, and medical image
databases
- Benchmarks and evaluation metrics
Prerequisites: Background in image processing (or
signal processing related courses), probability, and linear algebra.
Experience with machine learning or statistical pattern recognition will
be useful but not required.
Lecturer: Winston
Hsu (office: R512, CSIE Building)
TA: Yan-Ying
Chen, yanying {at} gmail.com, R506 (office hour: 3:30-5:30pm,
Friday)
Time: 14:20 ~ 17:20, Tuesday
Location: RM 111, CSIE Building
Mailing List:: https://cmlmail.csie.ntu.edu.tw/mailman/listinfo/mmai
Assessment:
- Assignments : 30%
- video shot detection and summary
- content-based image retrieval (CBIR)
- video concept detection/classification (??)
- Midterm Exam.: 20%
- Final Project: 50%
Textbook: NO. We will cover some active research areas
not included in any mature textbooks. Nevertheless, we will provide rich
papers and reference books.
Course Outline
Lecture 01 - Introduction (09/13/2011, Tuesday)
- Introduction of the course and topics to be covered
- Demo of example systems for audio/visual analysis/indexing/retrieval
- Logistic issues regarding grading, homework, references, etc.
- MATLAB introduction by TA
- Reading:
- S Keshav, "How to read a paper," SIGCOMM Comput. Commun. Rev.
37, 3 (Jul. 2007), 83-84.
- Peter Lyman, et al., "How
Much Information? 2003," University of California at
Berkeley.
- Dan Ellis, "Extracting Information from Music Audio,"
Communications of the ACM, special issue on Music Information
Retrieval, vol. 49, no. 8, pp.32-37, August 2006.
- Howard D. Wactlar, Alexander G. Hauptmann, Michael G. Christel,
Ricky A. Houghton, Andreas M. Olligschlaeger, "Complementary Video
and Audio Analysis for Broadcast News Archives," Communications of
the ACM, 43(2):42-47.
- Shih-Fu Chang, Wei-Ying Ma, Arnold Smeulders, "Recent Advances
and Challenges of Semantic Image/Video Search," In IEEE
International Conference on Acoustics, Speech, and Signal
Processing (ICASSP), Hawaii, USA, April 2007.
- Xing Xie et al. Mobile Search With Multimodal Queries.
Proceedings of the IEEE, (2008) vol. 96 (4) pp. 589-601
- Demo:
Lecture 02/03 - Video/Image Standards and Syntax Analysis(09/20,
27/2011, Tuesday)
- Introduction of video/image standards
- Video syntax analysis
- HW#1 -
video shot detection & summary (test
videos) (due: Oct. 11, 2010)
- Reading:
- D. Le Gall, "MPEG: A Video Compression Standard for Multimedia
Applications," Communications of ACM, April 1991, Vol 34, No. 4,
pp. 46-58.
- J.S. Boreczky, L.A. Rowe, "Comparison of video shot boundary
detection techniques," Proc of SPIE- Storage and Retrieval for
Still Image and Video Databases IV, Vol. 2670, San Diego, 1996.
(overview paper, must-read)
- I. Koprinska, S. Carrato, "Temporal video segmentation: a
survey," Signal Processing: Image Communication, vol. 16, pp.
477--500, 2001. (overview
paper, must-read) (Sec. 3.2
& 3.3, skipped)
- J.S. Boreczky, L.D. Wilcox, "A hidden Markov model framework for
video segmentation using audio and image features," in Proc. Int.
Conf. Acoustics, Speech, and Signal. Processing (ICASSP-98), Vol.
6, Seattle, WA, May 1998.
- S. Uchihashi and J. Foote, "Summarizing Video Using a Shot
Importance Measure and a Frame-Packing Algorithm," ICASSP1999 (must-read).
- "Comp2Watch:
Enhancing the Mobile Video Browsing Experience," Yu-Ming
Hsu, Ming-Kuang Tsa, Yen-Liang Lin, Winston Hsu, ACM Multimedia
2011 Workshop on Interactive Multimedia on Mobile and Portable
Devices (IMMPD 2011).
Lecture 04 - Color for Content-Based Image Retrieval (10/04/2011,
Tuesday)

- Overview of similarity-based image/video retrieval
- Color features and metrics for CBIR
- Reading:
- Chapter 11 of
book [Castelli'01] (overview paper, must-read)
- Y. Rui, T.S. Huang, and S.-F. Chang, "Image Retrieval: Current
Techniques, Promising Directions, and Open Issues," J. Visual
Comm. and Image Representation, vol. 10, no. 1, pp. 39-62, 1999.(overview
paper, must-read)
- M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B.
Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovicand D. Steele, and
P. Yanker. Query by image and video content: The QBIC system. In
IEEE Computer, volume 38, pages 23-31, 1995.
- John R. Smith, Shih-Fu Chang, "VisualSEEk: a Fully Automated
Content-Based Image Query System," In ACM Multimedia, Boston, MA,
November 1996.
- Kieran McDonald, Alan F. Smeaton, "A Comparison of Score, Rank
and Probability-Based Fusion Methods for Video Shot Retrieval,"
CIVR 2005. (multimodal fusio
Lecture 05/06 - Texture and Shape for CBlR (10/11/2011, 10/18/2011,
Tuesday)

- Texture and shape features in statistical & spectral domains
- HW#2 - Content-based Image Retrieval (due: Nov. 8, 2010)
(
+ dataset)
- Reading:
- "Texture features for browsing and retrieval of image data," B.
S. Manjunath and W.Y. Ma, IEEE Transactions on Pattern Analysis
and Machine Intelligence (PAMI), vol.18, no.8, pp.837-42, Aug
1996. (must-read)
- "Filtering for Texture Classification: A Comparative Study,"
Randen, et. al., IEEE Trans. Image Processing, 1999.
- "Textural Features
Corresponding to Visual Perception," Hideyuki Tamura. Shunji
Mori. Takashi Yamawaki. IEEE Transactions on Systems, Man, and
Cybernetics, No. 6, June 1978
- "Benchmarking of image features for content-based retrieval,"
Wei-Ying Ma and Hong Jiang Zhang, Record of the 32nd Asilomar
Conf. on Signals, Systems & Computers, 1998, Vol 1.
- "Color and Texture Descriptors," B. S. Manjunath, Jens-Rainer
Ohm, Vinod V. Vasudevan, Akio Yamada, IEEE Transactions on
Circuits and Systems for Video Technology, Vol 11, No. 6, June
2001.
- "MPEG-7 visual shape descriptors," Miroslaw Bober, IEEE
Transactions on Circuits and Systems for Video Technology, Vol 11,
No. 6, June 2001.
Lecture 07/08 - Multidimensional Indexing (10/25/2011, 11/01/2011,
Tuesday)(
+
)
- Guest Speaker: Kuan-Ting
Chen (PhD
Candidate)
- Outlines:
- The curse of dimensionality
- The problems of Linear Search
- The overview for efficient indexing methods (Tree-based vs.
hashing vs. Inverted structure)
- Tree-based indexing methods
- Approximate nearest neighbor search in multimedia database
- Reading:
- Vittorio Castelli, "Multidimensional
Indexing Structures for Content-based Retrieval," IBM
Research Report, 2001. (overview paper, must-read)
- Gaede, V., Gunther, O., "Multidimensional access methods,"
Journal ACM Comp. Surveys, Vol 30, Num 2, 1998. (overview
paper)
- Guttman, A., "R-trees: A dynamic index for spatial search," ACM
SIGMOD International Conference on Management of Data, 47–54.
1984.
- Beckmann N et al., "The R*-tree: An efficient and robust access
method for points and rectangles," ACM SIGMOD International
Conference on Management of Data, 322-331, 1990.
- P. Indyk et al. ,”Approximate Nearest Neighbors: Towards
Removing the Curse of Dimensionality,” STOC 1998
- S. Arya et al. , "An optimal algorithm for approximate nearest
neighbor searching in Fixed Dimensions," Journal of the ACM, 1998.
Lecture 09 - Hash-based Indexing (11/08/2011, Tuesday) 
- Hash-based indexing, hashing functions for LSH
- Reading:
- Malcolm Slaney and Michael Casey, "Locality-Sensitive Hashing
for Finding Nearest Neighbors. IEEE Signal Processing Magazine,
2008. (must-read)
- Alexandr Andoni and Piotr Indyk, "Near-Optimal Hashing
Algorithms for Approximate Nearest Neighbor in High Dimensions,"
Communications of the ACM, 2008. (good paper)
- M. Datar, et al. Locality-sensitive hashing scheme based on
p-stable distributions. SoCG 2004.
- P. Indyk et al. Approximate N<earest Neighbors: Towards
Removing the Curse of Dimensionality. STOC 1998
Lecture 10 - Feature Reduction and Multidimensional Indexing
(11/22/2011, Tuesday) 
- Feature reduction for high-dimensional data
- SVD demo (sample codes + jpeg
image)
- Reading:
- "Eigenfaces for
recognition," M Turk, A Pentland - Journal of Cognitive
Neuroscience, 1991. (must-read)
- "Matrices, vector spaces, and information retrieval," Michael W.
Berry, Zlatko Drmavc, and Elizabeth R. Jessup, SIAM Review,
41(2):335-362, June 1999. (SVD related)
- "Nonlinear dimensionality reduction by locally linear
embedding," Roweis & Saul, Science, 2000.
- Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas, "A Metric
for Distributions with Applications to Image Databases," IEEE
International Conference on Computer Vision, 1998.
Lecture 11 - Midterm (11/29/2011, Tuesday)
- Open-book style.
- The scope will be covered till Lecture 9, hash-based indexing.
Lecture 12/13 - Overview for Concept (Semantic)-based Image/Video
Analysis and Retrieval (12/06/2011, 12/13/2011,Tuesday) (
+
+
)
- Brieifing researches regarding concept design,
extraction, and search.
- Part of the tutorials covered in top conferences (e.g., ACM Multimedia
and SIGIR
2008, Multimedia
2009, and ICASSP
2009).
- Reading:
- M. R. Naphade, J. R. Smith, J. Tesic, S. F. Chang, W. Hsu, L.
Kennedy, A. Hauptmann and J. Curtis, "Large-scale concept ontology
for multimedia," IEEE MultiMedia Magazine, 13 (3), Sep. 2006. (must-read)
- Lexing Xie, Rong Yan, "Extracting
Semantics from Multimedia Content: Challenges and Solutions,"
In Multimedia Content Analysis: Theory and Applications, A.
Divakaran Ed., Springer, 2008.(must-read)
- M. Naphade, J. R. Smith, "On the Detection of Semantic Concepts
at TRECVID," ACM Multimedia 2004.
- A. G. Hauptmann, R. Yan, W.H. Lin, M. Christel and H. Wactlar,
"Can High-Level Concepts Fill the Semantic Gap in Video Retrieval?
A Case Study With Broadcast News." In IEEE Transactions on
Multimedia, Aug. 2007.
- N. Kumar, et al., Describable Visual Attributes for Face
Verification and Image Search," IEEE Trans. on PAMI, October 2011.
- A.-J. Cheng, et al., Personalized Travel Recommendation by
Mining People Attributes from Community-Contributed Photos, ACM
Multimedia 2011.
- Y.-H. Lei, et al., Photo Search by Face Positions and Facial
Attributes on Touch Devices, ACM Multimedia 2011 (Grand
Challenge).
Lecture 14 - Video Structure/Event Discovery (12/20/2011, Tuesday)(
+
)
- Generative models -- HMM
- Video structure discovery
- Broadcast news videos
- Maximum Entropy method
- HMM method
- Sports highlights
- Reading:
- L. Xie, S.-F. Chang, A. Divakaran, and H. Sun, "Structure
analysis of soccer video with hidden markov models," ICASSP, May
2002.(must-read)
- Xinguo Yu , Changsheng Xu , Hon Wai Leong , Qi Tian , Qing
Tang , Kong Wah Wan, "Trajectory-based ball detection and
tracking with applications to semantic analysis of broadcast
soccer video," ACM Multimedia, 2003.
- L. Chaisorn, T.-S Chua and C.-H. Lee, "The segmentation of
news video into story units," ICME 2002.
- Winston H. Hsu and Shih-Fu Chang, "A Statistical Framework for
Fusing Mid-level Perceptual Features in News Story
Segmentation," ICME, Baltimore, 2003.
- Yoshua Bengio, "Markovian Models for Sequential Data," 1999
- Jeff A. BILMES, "What HMMs Can Do," IEICE Transactions on
Information and Systems, 2006.
- Bayes Net Toolbox for
MATLAB (for graphical models)
Lecture 15 - Clustering Approaches for Visual Document Organization
(12/27/2011, Tuesday)
- Conventional Clustering
- K-means
- GMM
- Hierarchical clustering
- Video clip similarity
- Eearth Mover's Distance
- Video Signature
- News story threading (tracking/clustering)
- Reading:
- Chapter 10 - Unsupervised Learing and Clustering of [Duda'02]
- A. K. Jain, M.N. Murthy and P.J. Flynn, Data Clustering: A
Review, ACM Computing Reviews, Nov 1999.
- Greg Hamerly, Charles Elkan. Learning the K in K-Means, NIPS
2003.
- Natsev et al., Learning the semantics of multimedia queries and concepts from a small number of examples, ACM Multimedia 2005.
- Yuxin Peng, Chong-Wah Ngo, "EMD-Based Video Clip Retrieval by
Many-to-Many Matching," CIVR 2005: 71-81.
- Cheung, S.-C. and A. Zakhor, "Estimation of web video
multiplicity," in Proceedings of the SPIE -- Internet Imaging,
Volume 3964, pp. 34-36, 2000.
- Winston H. Hsu, Shih-Fu Chang: Topic Tracking Across Broadcast
News Videos with Visual Duplicates and Semantic Concepts. ICIP
2006: 141-144
- Xiao Wu, et al., "Real-Time Near-Duplicate Elimination for Web
Video Search with Content and Context," IEEE TMM, February 2009.
(must-read)
Lecture 16 - Video Retrieval and Local Features (01/03/12, Tuesday)(
+
)
- Query by multiple examples
- Video threading and retrieval
- Local features
- Advanced issues in multimedia analysis and retrieval
- Reading:
- Apostol Natsev, Alexander Haubold, Jelena Tesic, Lexing Xie, Rong Yan, "Semantic concept-based query expansion and re-ranking for multimedia retrieval," ACM Multimedia 2007. (recommended)
- "Video Google: A Text Retrieval Approach to Object Matching in Videos," J. Sivic, and A. Zisserman, ICCV, 2003.
- A. Haubold and A. Natsev, "Semantic multimedia retrieval using lexical query expansion and model-based reranking," In International Conference on Multimedia and Expo(ICME), 2006.
- "Video Search Reranking through Random Walk over Document-Level Context Graph," Winston H. Hsu, Lyndon Kennedy, and Shih-Fu Chang, ACM Multimedia 2007, Augsburg, Germany, September 23-29, 2007.
Lecture 17 - Project Presentation (01/10/2012, Tuesday)
- Presentation: 10 min/group.
- Final report: 01/16/2012, Monday
- Presentation list:
- Bring your mugs. We will host free coffee and tee.
Course Projects
Purpose: posting student projects in the course and recruiting
project members
PAST COURSE PROJECTS 
Course Material
Books:
- [Castelli'01] Image Databases: Search and Retrieval of Digital
Imagery , by Vittorio Castelli and Lawrence D. Bergman,
Wiley-Interscience, 2001
- [Gold'99] Speech and Audio Signal Processing: Processing and
Perception of Speech and Music, by Ben Gold and Nelson Morgan, Wiley,
1999
- [Bishop'06] Pattern Recognition and Machine Learning, by Christopher
M. Bishop, Springer, 2006
- [Alpaydin'04] Introduction to Machine Learning, by Ethem Alpaydin,
The MIT Press, 2004
- [Duda'02] Pattern Classification, by Richard Duda, et. al., 2nd
Edition, Wiley-Interscience, 2000.