Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks

  • 2018-02-09 18:55:34
  • Matthew Ricci, Junkyung Kim, Thomas Serre
  • 24

Abstract

The robust and efficient recognition of visual relations in images is ahallmark of biological vision. Here, we argue that, despite recent progress invisual recognition, modern machine vision algorithms are severely limited intheir ability to learn visual relations. Through controlled experiments, wedemonstrate that visual-relation problems strain convolutional neural networks(CNNs). The networks eventually break altogether when rote memorization becomesimpossible such as when the intra-class variability exceeds their capacity. Wefurther show that another type of feedforward network, called a relationalnetwork (RN), which was shown to successfully solve seemingly difficult visualquestion answering (VQA) problems on the CLEVR datasets, suffers similarlimitations. Motivated by the comparable success of biological vision, we arguethat feedback mechanisms including working memory and attention are the keycomputational components underlying abstract visual reasoning.

 

Quick Read (beta)

loading the full paper ...