Visual Entailment Task for Visually-Grounded Language Learning

  • 2018-11-26 18:37:25
  • Ning Xie, Farley Lai, Derek Doran, Asim Kadav
  • 2


We introduce a new inference task - Visual Entailment (VE) - which differsfrom traditional Textual Entailment (TE) tasks whereby a premise is defined byan image, rather than a natural language sentence as in TE tasks. A noveldataset SNLI-VE is proposed for VE tasks based on the Stanford Natural LanguageInference corpus and Flickr30K. We introduce a differentiable architecturecalled the Explainable Visual Entailment model (EVE) to tackle the VE problem.EVE and several other state-of-the-art visual question answering (VQA) basedmodels are evaluated on the SNLI-VE dataset, facilitating grounded languageunderstanding and providing insights on how modern VQA based models perform.


Introduction (beta)



Conclusion (beta)