Harvesting Paragraph-Level Question-Answer Pairs from Wikipedia

  • 2018-05-15 17:58:25
  • Xinya Du, Claire Cardie
  • 5

Abstract

We study the task of generating from Wikipedia articles question-answer pairsthat cover content beyond a single sentence. We propose a neural networkapproach that incorporates coreference knowledge via a novel gating mechanism.Compared to models that only take into account sentence-level information(Heilman and Smith, 2010; Du et al., 2017; Zhou et al., 2017), we find that thelinguistic knowledge introduced by the coreference representation aids questiongeneration significantly, producing models that outperform the currentstate-of-the-art. We apply our system (composed of an answer span extractionsystem and the passage-level QG system) to the 10,000 top-ranking Wikipediaarticles and create a corpus of over one million question-answer pairs. We alsoprovide a qualitative analysis for this large-scale generated corpus fromWikipedia.

 

Introduction (beta)

None

 

Conclusion (beta)

None