DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension

  • 2018-10-10 09:28:33
  • Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra, Karthik Sankaranarayanan
  • 0

Abstract

We propose DuoRC, a novel dataset for Reading Comprehension (RC) thatmotivates several new challenges for neural approaches in languageunderstanding beyond those offered by existing RC datasets. DuoRC contains186,089 unique question-answer pairs created from a collection of 7680 pairs ofmovie plots where each pair in the collection reflects two versions of the samemovie - one from Wikipedia and the other from IMDb - written by two differentauthors. We asked crowdsourced workers to create questions from one version ofthe plot and a different set of workers to extract or synthesize answers fromthe other version. This unique characteristic of DuoRC where questions andanswers are created from different versions of a document narrating the sameunderlying story, ensures by design, that there is very little lexical overlapbetween the questions created from one version and the segments containing theanswer in the other version. Further, since the two versions have differentlevels of plot detail, narration style, vocabulary, etc., answering questionsfrom the second version requires deeper language understanding andincorporating external background knowledge. Additionally, the narrative styleof passages arising from movie plots (as opposed to typical descriptivepassages in existing datasets) exhibits the need to perform complex reasoningover events across multiple sentences. Indeed, we observe that state-of-the-artneural RC models which have achieved near human performance on the SQuADdataset, even when coupled with traditional NLP techniques to address thechallenges presented in DuoRC exhibit very poor performance (F1 score of 37.42%on DuoRC v/s 86% on SQuAD dataset). This opens up several interesting researchavenues wherein DuoRC could complement other RC datasets to explore novelneural approaches for studying language understanding.

 

Quick Read (beta)

loading the full paper ...