Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models

  • 2018-09-07 04:17:40
  • Takashi Wada, Tomoharu Iwata
  • 4

Abstract

We propose an unsupervised method to obtain cross-lingual embeddings withoutany parallel data or pre-trained word embeddings. The proposed model, which wecall multilingual neural language models, takes sentences of multiple languagesas an input. The proposed model contains bidirectional LSTMs that perform asforward and backward language models, and these networks are shared among allthe languages. The other parameters, i.e. word embeddings and lineartransformation between hidden states and outputs, are specific to eachlanguage. The shared LSTMs can capture the common sentence structure among alllanguages. Accordingly, word embeddings of each language are mapped into acommon latent space, making it possible to measure the similarity of wordsacross multiple languages. We evaluate the quality of the cross-lingual wordembeddings on a word alignment task. Our experiments demonstrate that our modelcan obtain cross-lingual embeddings of much higher quality than existingunsupervised models when only a small amount of monolingual data (i.e. 50ksentences) are available, or the domains of monolingual data are differentacross languages.

 

Quick Read (beta)

loading the full paper ...