Compact Deep Models

Overview

Deep models have recently gained a lot of attention for their effectiveness in many computer vision problems. Although effective, these models are often very large in model size, making them difficult to deploy on the edge and embedded devices. We have proposed a few compact deep models that take up only a few megabytes of memory and are suitable for those devices.

FSANet for head pose estimation. We propose a method for head pose estimation from a single image. Previous methods often predict head poses through landmark or depth estimation and would require more computation than necessary. Our method is based on regression and feature aggregation. For having a compact model, we employ the soft stagewise regression scheme. Existing feature aggregation methods treat inputs as a bag of features and thus ignore their spatial relationship in a feature map. We propose to learn a fine-grained structure mapping for spatially grouping features before aggregation. The fine-grained structure provides part-based information and pooled values. By utilizing learnable and non-learnable importance over the spatial location, different model variants can be generated and form a complementary ensemble. Experiments show that our method outperforms the state-of-the-art methods including both the landmark-free ones and the ones based on landmark or depth estimation. With only a single RGB frame as input, our method even outperforms methods utilizing multi-modality information (RGB-D, RGB-Time) on estimating the yaw angle. Furthermore, the memory overhead of our model is 100 times smaller than those of previous methods. The work was published at CVPR 2019.
SSRNet for age estimation. We propose a novel CNN model called Soft Stagewise Regression Network (SSR-Net) for age estimation from a single image with a compact model size. Inspired by DEX, we address age estimation by performing multi-class classification and then turning classification results into regression by calculating the expected values. SSR-Net takes a coarse-to-fine strategy and performs multi-class classification with multiple stages. Each stage is only responsible for refining the decision of its previous stage for more accurate age estimation. Thus, each stage performs a task with few classes and requires few neurons, greatly reducing the model size. For addressing the quantization issue introduced by grouping ages into classes, SSR-Net assigns a dynamic range to each age class by allowing it to be shifted and scaled according to the input face image. Both the multi-stage strategy and the dynamic range are incorporated into the formulation of soft stagewise regression. A novel network architecture is proposed for carrying out soft stagewise regression. The resultant SSR-Net model is very compact and takes only 0.32 MB. Despite its compact size, SSR-Net's performance approaches those of the state-of-the-art methods whose model sizes are often more than 1500X larger. The work was published at IJCAI 2018.

Publications

FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation from a Single Image: Tsun-Yi Yang, Yi-Ting Chen, Yen-Yu Lin, Yung-Yu Chuang; CVPR 2019
SSR-Net: A Compact Soft Stagewise Regression Network for Age Estimation: Tsun-Yi Yang, Yi-Hsuan Huang, Yen-Yu Lin, Yung-Yu Chuang; IJCAI 2018

cyy -a-t- csie.ntu.edu.tw