Machine-Created Universal Language for Cross-lingual Transfer

  • 2023-05-22 15:41:09
  • Yaobo Liang, Quanzhi Zhu, Junhe Zhao, Nan Duan
There are two types of approaches to solving cross-lingual transfer:multilingual pre-training implicitly aligns the hidden representations ofdifferent languages, while the translate-test explicitly translates differentlanguages to an intermediate language, such as English. Translate-test hasbetter interpretability compared to multilingual pre-training. However, thetranslate-test has lower performance than multilingual pre-training(Conneau andLample, 2019; Conneau et al, 2020) and can't solve word-level tasks becausetranslation rearranges the word order. Therefore, we propose a newMachine-created Universal Language (MUL) as a new intermediate language. MULconsists of a set of discrete symbols as universal vocabulary and NL-MULtranslator for translating from multiple natural languages to MUL. MUL unifiescommon concepts from different languages into the same universal word forbetter cross-language transfer. And MUL preserves the language-specific wordsas well as word order, so the model can be easily applied to word-level tasks.Our experiments show that translating into MUL achieves better performancecompared to multilingual pre-training, and our analyses show that MUL has goodinterpretability.


