RAVEN: A Dataset for Relational and Analogical Visual rEasoNing

Abstract

Dramatic progress has been witnessed in basic vision tasks involvinglow-level perception, such as object recognition, detection, and tracking.Unfortunately, there is still an enormous performance gap between artificialvision systems and human intelligence in terms of higher-level vision problems,especially ones involving reasoning. Earlier attempts in equipping machineswith high-level reasoning have hovered around Visual Question Answering (VQA),one typical task associating vision and language understanding. In this work,we propose a new dataset, built in the context of Raven's Progressive Matrices(RPM) and aimed at lifting machine intelligence by associating vision withstructural, relational, and analogical reasoning in a hierarchicalrepresentation. Unlike previous works in measuring abstract reasoning usingRPM, we establish a semantic link between vision and reasoning by providingstructure representation. This addition enables a new type of abstractreasoning by jointly operating on the structure representation. Machinereasoning ability using modern computer vision is evaluated in this newlyproposed dataset. Additionally, we also provide human performance as areference. Finally, we show consistent improvement across all models byincorporating a simple neural module that combines visual understanding andstructure reasoning.

Quick Read (beta)

loading the full paper ...