As the number of digital images increase explosively, the techniques for viewing and browsing images become more and more important. Image thumbnailing is a technique to select and crop the significant and important portions of images as thumbnails, in order to give better impressions on the viewers and make the process related to large amount of images more efficient.
There are three main approaches for image thumbnailing: saliency analysis, face detection, and gaze detection. All of them detect the Region of Interest (ROI) of image, but by different algorithms and assumptions. For automatic thumbnailing, saliency analysis and face detection are more suitable since gaze-based approach requires additional works to detect the actual gazes of viewers. But both saliency-based and face-based approaches have specified assumptions hence their performances depend on the property of images.
Since most of the face detection algorithms are weak in detecting non-frontal faces, we propose a multi-view face detection algorithm to detect the five different views of faces, include frontal, half-profile, and profile ones. Moreover, we integrate saliency analysis and face detection and propose a fused approach for image thumbnailing. The advantages in saliency-based and face-based approaches are combined, and our approach is able to produce appropriate and pleasing thumbnails with the consideration of photography aesthetics.
Chih-Chau Ma, Yi-Hsuan Yang, Winston Hsu,
“Image Thumbnailing via Muti-View Face Detection and Saliency Analysis”.
Image thumbnailing is an essential technique to efficiently visualize large-scale consumer photos or image search results. In this paper, we propose a hierarchical multi-view face detection algorithm for image thumbnailing, and a unified framework which aggregates both the low-level saliency and high-level semantics through detected faces to augment thumbnail generation with the consideration of photography aesthetics. The approach can produce more informative and pleasant thumbnails in comparison with the prior work with limitation representations. We also investigate the effective feature sets and informative feature dimensions for discriminative multi-view face detection.
This page includes the image data used for image thumbnailing and classifier training. The data are collected from the web via search engines. The images are in JPG format with various sizes for image thumbnailing, and a fix size of 100×100 pixels for classifier training. In the multi-view scheme, five different views of faces are considered: left (left 90°), lefthalf (left 45°), frontal (0°), righthalf (right 45°), right (right 90°). The angle is from the direction of image viewers.
The images in this data are used for the training and validation of two classifiers: the face/non-face classifier and the view classifier. Most of the face images are collected from the web, and the head areas in the images are cropped manually. The non-face images as well as some positive face images used for face classifier are selected from the true and false positives of a face detector. All of these images are in 100×100 JPG format.
face/: images for the face classifier.
posface/: positive face images for training, with five different views (left, lefthalf, frontal, righthalf, right).
nonface/: non-face images for training as the negative data, which are the false positives of a face detector.
validation/: validation images for the face classifier.
view/: images for the view classifier.
left/, lefthalf/, frontal/, righthalf/, right/: face images in five different views for training.
validation/: validation images of the view classifier.