MAREO: Memory- and Attention- based visual REasOning

Abstract

Humans continue to outperform modern AI systems in their ability to parse andunderstand complex visual scenes flexibly. Attention and memory are two systemsknown to play a critical role in our ability to selectively maintain andmanipulate behaviorally-relevant visual information to solve some of the mostchallenging visual reasoning tasks. Here, we present a novel architecture forvisual reasoning inspired by the cognitive-science literature on visualreasoning, the Memory- and Attention-based (visual) REasOning (MAREO)architecture. MAREO instantiates an active-vision theory, which posits that thebrain solves complex visual reasoning problems compositionally by learning tocombine previously-learned elementary visual operations to form more complexvisual routines. MAREO learns to solve visual reasoning tasks via sequences ofattention shifts to route and maintain task-relevant visual information into amemory bank via a multi-head transformer module. Visual routines are thendeployed by a dedicated reasoning module trained to judge various relationsbetween objects in the scenes. Experiments on tasks containing complex visualrelations (SVRT challenge) and same-different differentiation, relation matchto sample, Raven's and Identity rules from ART challenge demonstrate MAREO'sability to learn visual routines in a robust and sample-efficient manner. Wealso show the zero-shot generalization on unseen tasks and the compositionalitynature of the architecture.

Quick Read (beta)

loading the full paper ...