Similarity of Sentence Representations in Multilingual LMs: Resolving Conflicting Literature and Case Study of Baltic Languages

Abstract

Low-resource languages, such as Baltic languages, benefit from LargeMultilingual Models (LMs) that possess remarkable cross-lingual transferperformance capabilities. This work is an interpretation and analysis studyinto cross-lingual representations of Multilingual LMs. Previous workshypothesized that these LMs internally project representations of differentlanguages into a shared cross-lingual space. However, the literature producedcontradictory results. In this paper, we revisit the prior work claiming that"BERT is not an Interlingua" and show that different languages do converge to ashared space in such language models with another choice of pooling strategy orsimilarity index. Then, we perform cross-lingual representational analysis forthe two most popular multilingual LMs employing 378 pairwise languagecomparisons. We discover that while most languages share joint cross-lingualspace, some do not. However, we observe that Baltic languages do belong to thatshared space.

Quick Read (beta)

loading the full paper ...