Abstract
While impressive performance has been achieved on the task of Answer SentenceSelection (AS2) for English, the same does not hold for languages that lacklarge labeled datasets. In this work, we propose Cross-Lingual KnowledgeDistillation (CLKD) from a strong English AS2 teacher as a method to train AS2models for low-resource languages in the tasks without the need of labeled datafor the target language. To evaluate our method, we introduce 1) Xtr-WikiQA, atranslation-based WikiQA dataset for 9 additional languages, and 2) TyDi-AS2, amultilingual AS2 dataset with over 70K questions spanning 8 typologicallydiverse languages. We conduct extensive experiments on Xtr-WikiQA and TyDi-AS2with multiple teachers, diverse monolingual and multilingual pretrainedlanguage models (PLMs) as students, and both monolingual and multilingualtraining. The results demonstrate that CLKD either outperforms or rivals evensupervised fine-tuning with the same amount of labeled data and a combinationof machine translation and the teacher model. Our method can potentially enablestronger AS2 models for low-resource languages, while TyDi-AS2 can serve as thelargest multilingual AS2 dataset for further studies in the research community.