One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval

Abstract

We present CORA, a Cross-lingual Open-Retrieval Answer Generation model thatcan answer questions across many languages even when language-specificannotated data or knowledge sources are unavailable. We introduce a new densepassage retrieval algorithm that is trained to retrieve documents acrosslanguages for a question. Combined with a multilingual autoregressivegeneration model, CORA answers directly in the target language without anytranslation or in-language retrieval modules as used in prior work. We proposean iterative training method that automatically extends annotated dataavailable only in high-resource languages to low-resource ones. Our resultsshow that CORA substantially outperforms the previous state of the art onmultilingual open question answering benchmarks across 26 languages, 9 of whichare unseen during training. Our analyses show the significance of cross-lingualretrieval and generation in many languages, particularly under low-resourcesettings.

Quick Read (beta)

loading the full paper ...