Cross-lingual Transferring of Pre-trained Contextualized Language Models

Abstract

Though the pre-trained contextualized language model (PrLM) has made asignificant impact on NLP, training PrLMs in languages other than English canbe impractical for two reasons: other languages often lack corpora sufficientfor training powerful PrLMs, and because of the commonalities among humanlanguages, computationally expensive PrLM training for different languages issomewhat redundant. In this work, building upon the recent works connectingcross-lingual model transferring and neural machine translation, we thuspropose a novel cross-lingual model transferring framework for PrLMs: TreLM. Tohandle the symbol order and sequence length differences between languages, wepropose an intermediate ``TRILayer" structure that learns from thesedifferences and creates a better transfer in our primary translation direction,as well as a new cross-lingual language modeling objective for transfertraining. Additionally, we showcase an embedding aligning that adversariallyadapts a PrLM's non-contextualized embedding space and the TRILayer structureto learn a text transformation network across languages, which addresses thevocabulary difference between languages. Experiments on both languageunderstanding and structure parsing tasks show the proposed frameworksignificantly outperforms language models trained from scratch with limiteddata in both performance and efficiency. Moreover, despite an insignificantperformance loss compared to pre-training from scratch in resource-richscenarios, our cross-lingual model transferring framework is significantly moreeconomical.

Quick Read (beta)

loading the full paper ...