A Principled Framework for Evaluating on Typologically Diverse Languages

  • 2024-07-06 10:31:02
  • Esther Ploeger, Wessel Poelman, Andreas Holck H√łeg-Petersen, Anders Schlichtkrull, Miryam de Lhoneux, Johannes Bjerva
  • 0


Beyond individual languages, multilingual natural language processing (NLP)research increasingly aims to develop models that perform well across languagesgenerally. However, evaluating these systems on all the world's languages ispractically infeasible. To attain generalizability, representative languagesampling is essential. Previous work argues that generalizable multilingualevaluation sets should contain languages with diverse typological properties.However, 'typologically diverse' language samples have been found to varyconsiderably in this regard, and popular sampling methods are flawed andinconsistent. We present a language sampling framework for selecting highlytypologically diverse languages given a sampling frame, informed by languagetypology. We compare sampling methods with a range of metrics and find that oursystematic methods consistently retrieve more typologically diverse languageselections than previous methods in NLP. Moreover, we provide evidence thatthis affects generalizability in multilingual model evaluation, emphasizing theimportance of diverse language sampling in NLP evaluation.


