Abstract
Learning subtle yet discriminative features (e.g., beak and eyes for a bird)plays a significant role in fine-grained image recognition. Existingattention-based approaches localize and amplify significant parts to learnfine-grained details, which often suffer from a limited number of parts andheavy computational cost. In this paper, we propose to learn such fine-grainedfeatures from hundreds of part proposals by Trilinear Attention SamplingNetwork (TASN) in an efficient teacher-student manner. Specifically, TASNconsists of 1) a trilinear attention module, which generates attention maps bymodeling the inter-channel relationships, 2) an attention-based sampler whichhighlights attended parts with high resolution, and 3) a feature distiller,which distills part features into a global one by weight sharing and featurepreserving strategies. Extensive experiments verify that TASN yields the bestperformance under the same settings with the most competitive approaches, iniNaturalist-2017, CUB-Bird, and Stanford-Cars datasets.