Revisiting Language Encoding in Learning Multilingual Representations

Abstract

Transformer has demonstrated its great power to learn contextual wordrepresentations for multiple languages in a single model. To processmultilingual sentences in the model, a learnable vector is usually assigned toeach language, which is called "language embedding". The language embedding canbe either added to the word embedding or attached at the beginning of thesentence. It serves as a language-specific signal for the Transformer tocapture contextual representations across languages. In this paper, we revisitthe use of language embedding and identify several problems in the existingformulations. By investigating the interaction between language embedding andword embedding in the self-attention module, we find that the current methodscannot reflect the language-specific word correlation well. Given thesefindings, we propose a new approach called Cross-lingual Language Projection(XLP) to replace language embedding. For a sentence, XLP projects the wordembeddings into language-specific semantic space, and then the projectedembeddings will be fed into the Transformer model to process with theirlanguage-specific meanings. In such a way, XLP achieves the purpose ofappropriately encoding "language" in a multilingual Transformer model.Experimental results show that XLP can freely and significantly boost the modelperformance on extensive multilingual benchmark datasets. Codes and models willbe released at https://github.com/lsj2408/XLP.

Quick Read (beta)

loading the full paper ...