Abstract
Progress in cross-lingual modeling depends on challenging, realistic, anddiverse evaluation sets. We introduce Multilingual Knowledge Questions andAnswers (MKQA), an open-domain question answering evaluation set comprising 10kquestion-answer pairs aligned across 26 typologically diverse languages (260kquestion-answer pairs in total). The goal of this dataset is to provide achallenging benchmark for question answering quality across a wide set oflanguages. Answers are based on a language-independent data representation,making results comparable across languages and independent of language-specificpassages. With 26 languages, this dataset supplies the widest range oflanguages to-date for evaluating question answering. We benchmarkstate-of-the-art extractive question answering baselines, trained on NaturalQuestions, including Multilingual BERT, and XLM-RoBERTa, in zero shot andtranslation settings. Results indicate this dataset is challenging, especiallyin low-resource languages.