Abstract
Multilingual natural language processing is getting increased attention, withnumerous models, benchmarks, and methods being released for many languages.English is often used in multilingual evaluation to prompt language models(LMs), mainly to overcome the lack of instruction tuning data in otherlanguages. In this position paper, we lay out two roles of English inmultilingual LM evaluations: as an interface and as a natural language. Weargue that these roles have different goals: task performance versus languageunderstanding. This discrepancy is highlighted with examples from datasets andevaluation setups. Numerous works explicitly use English as an interface toboost task performance. We recommend to move away from this imprecise methodand instead focus on furthering language understanding.