A Corpus for Reasoning About Natural Language Grounded in Photographs

  • 2018-11-01 16:47:44
  • Alane Suhr, Stephanie Zhou, Iris Zhang, Huajun Bai, Yoav Artzi
  • 30

Abstract

We introduce a new dataset for joint reasoning about language and vision. Thedata contains 107,296 examples of English sentences paired with webphotographs. The task is to determine whether a natural language caption istrue about a photograph. We present an approach for finding visually compleximages and crowdsourcing linguistically diverse captions. Qualitative analysisshows the data requires complex reasoning about quantities, comparisons, andrelationships between objects. Evaluation of state-of-the-art visual reasoningmethods shows the data is a challenge for current methods.

 

Introduction (beta)

None

 

Conclusion (beta)

None