Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Abstract

Pretrained contextual representation models (Peters et al., 2018; Devlin etal., 2018) have pushed forward the state-of-the-art on many NLP tasks. A newrelease of BERT (Devlin, 2018) includes a model simultaneously pretrained on104 languages with impressive performance for zero-shot cross-lingual transferon a natural language inference task. This paper explores the broadercross-lingual potential of mBERT (multilingual) as a zero shot languagetransfer model on 5 NLP tasks covering a total of 39 languages from variouslanguage families: NLI, document classification, NER, POS tagging, anddependency parsing. We compare mBERT with the best-published methods forzero-shot cross-lingual transfer and find mBERT competitive on each task.Additionally, we investigate the most effective strategy for utilizing mBERT inthis manner, determine to what extent mBERT generalizes away from languagespecific features, and measure factors that influence cross-lingual transfer.

Quick Read (beta)

loading the full paper ...