Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification from the Bottom Up

Abstract

Given a training dataset composed of images and corresponding categorylabels, deep convolutional neural networks show a strong ability in miningdiscriminative parts for image classification. However, deep convolutionalneural networks trained with image level labels only tend to focus on the mostdiscriminative parts while missing other object parts, which could providecomplementary information. In this paper, we approach this problem from adifferent perspective. We build complementary parts models in a weaklysupervised manner to retrieve information suppressed by dominant object partsdetected by convolutional neural networks. Given image level labels only, wefirst extract rough object instances by performing weakly supervised objectdetection and instance segmentation using Mask R-CNN and CRF-basedsegmentation. Then we estimate and search for the best parts model for eachobject instance under the principle of preserving as much diversity aspossible. In the last stage, we build a bi-directional long short-term memory(LSTM) network to fuze and encode the partial information of thesecomplementary parts into a comprehensive feature for image classification.Experimental results indicate that the proposed method not only achievessignificant improvement over our baseline models, but also outperformsstate-of-the-art algorithms by a large margin (6.7%, 2.8%, 5.2% respectively)on Stanford Dogs 120, Caltech-UCSD Birds 2011-200 and Caltech 256.

Quick Read (beta)

loading the full paper ...