How multilingual is Multilingual BERT?

Abstract

In this paper, we show that Multilingual BERT (M-BERT), released by Devlin etal. (2018) as a single language model pre-trained from monolingual corpora in104 languages, is surprisingly good at zero-shot cross-lingual model transfer,in which task-specific annotations in one language are used to fine-tune themodel for evaluation in another language. To understand why, we present a largenumber of probing experiments, showing that transfer is possible even tolanguages in different scripts, that transfer works best between typologicallysimilar languages, that monolingual corpora can train models forcode-switching, and that the model can find translation pairs. From theseresults, we can conclude that M-BERT does create multilingual representations,but that these representations exhibit systematic deficiencies affectingcertain language pairs.

Quick Read (beta)

loading the full paper ...