Establishing Interlingua in Multilingual Language Models

  • 2021-09-02 20:53:14
  • Maksym Del, Mark Fishel
  • 2


Large multilingual language models show remarkable zero-shot cross-lingualtransfer performance on a range of tasks. Follow-up works hypothesized thatthese models internally project representations of different languages into ashared interlingual space. However, they produced contradictory results. Inthis paper, we correct %one of the previous works the famous prior workclaiming that "BERT is not an Interlingua" and show that with the proper choiceof sentence representation different languages actually do converge to a sharedspace in such language models. Furthermore, we demonstrate that thisconvergence pattern is robust across four measures of correlation similarityand six mBERT-like models. We then extend our analysis to 28 diverse languagesand find that the interlingual space exhibits a particular structure similar tothe linguistic relatedness of languages. We also highlight a few outlierlanguages that seem to fail to converge to the shared space. The code forreplicating our results is available at the following URL:


