XOR QA: Cross-lingual Open-Retrieval Question Answering

  • 2021-04-13 05:22:01
  • Akari Asai, Jungo Kasai, Jonathan H. Clark, Kenton Lee, Eunsol Choi, Hannaneh Hajishirzi
Multilingual question answering tasks typically assume answers exist in thesame language as the question. Yet in practice, many languages face bothinformation scarcity -- where languages have few reference articles -- andinformation asymmetry -- where questions reference concepts from othercultures. This work extends open-retrieval question answering to across-lingual setting enabling questions from one language to be answered viaanswer content from another language. We construct a large-scale dataset builton questions from TyDi QA lacking same-language answers. Our task formulation,called Cross-lingual Open Retrieval Question Answering (XOR QA), includes 40kinformation-seeking questions from across 7 diverse non-English languages.Based on this dataset, we introduce three new tasks that involve cross-lingualdocument retrieval using multi-lingual and English resources. We establishbaselines with state-of-the-art machine translation systems and cross-lingualpretrained models. Experimental results suggest that XOR QA is a challengingtask that will facilitate the development of novel techniques for multilingualquestion answering. Our data and code are available athttps://nlp.cs.washington.edu/xorqa.


