Abstract
Learning what to share between tasks has been a topic of great importancerecently, as strategic sharing of knowledge has been shown to improvedownstream task performance. This is particularly important for multilingualapplications, as most languages in the world are under-resourced. Here, weconsider the setting of training models on multiple different languages at thesame time, when little or no data is available for languages other thanEnglish. We show that this challenging setup can be approached usingmeta-learning, where, in addition to training a source language model, anothermodel learns to select which training instances are the most beneficial to thefirst. We experiment using standard supervised, zero-shot cross-lingual, aswell as few-shot cross-lingual settings for different natural languageunderstanding tasks (natural language inference, question answering). Ourextensive experimental setup demonstrates the consistent effectiveness ofmeta-learning for a total of 15 languages. We improve upon the state-of-the-artfor zero-shot and few-shot NLI (on MultiNLI and XNLI) and QA (on the MLQAdataset). A comprehensive error analysis indicates that the correlation oftypological features between languages can partly explain when parametersharing learned via meta-learning is beneficial.