Learning Where to Fixate on Foveated Images

Abstract

Foveation, the ability to sequentially acquire high-acuity regions of a sceneviewed initially at low-acuity, is a key property of biological vision systems.In a computer vision system, foveation is also desired to increase dataefficiency and derive task-relevant features. Yet, most existing deep learningmodels lack the ability to foveate. In this paper, we propose a deepreinforcement learning-based foveation model, DRIFT, and apply it tochallenging fine-grained classification tasks. Training of DRIFT requires onlyimage-level category labels and encourages fixations to contain discriminativeinformation while maintaining data efficiency. Specifically, we formulatefoveation as a sequential decision-making process and train a foveation actornetwork with a novel Deep Deterministic Policy Gradient by Conditioned Criticand Coaching (DDPGC3) algorithm. In addition, we propose to shape the reward toprovide informative feedback after each fixation to better guide the RLtraining. We demonstrate the effectiveness of our method on five fine-grainedclassification benchmark datasets, and show that the proposed approach achievesstate-of-the-art performance using an order-of-magnitude fewer pixels.

Quick Read (beta)

loading the full paper ...