News Video Story Segmentation using Fusion of Multi-Level Multi-modal Features in TRECVID 2003

Winston Hsu, Lyndon Kennedy, Chih-Wei Huang, Shih-Fu Chang, Ching-Yung Lin, and Giridharan Iyengar

ICASSP 2004 Reviews' comments :

>> Reviewer 1
Good use of a maxent model

Since the weights determined during training decide which features are useful, it would help to give some examples of features which turned out to have high weights.

A couple more references to other work on video segmentation would be useful (work which is from other places).

>> Reviewer 2
An interesting paper. Quite comprehensive in the use of data fusion.

However some things are quite unclear.
1. Your feature wrapper idea and the binary encoding of the shot locations from each feature index is certainly unclear.
2. What is the ''motion intensity'' on page 3?
3. what do you mean by the term ''referemce boundaries''?

An ambitious project nevertheless.

>> Reviewer 3
The presented work is interesting, but in the way that is described it is not very relevant for a conference on signal processing. The authors are suggested to submit their work to a more specific conference on multimedia topics.

-----