Abstract
Despite advances in dependency parsing, languages with small treebanks stillpresent challenges. We assess recent approaches to multilingual contextual wordrepresentations (CWRs), and compare them for crosslingual transfer from alanguage with a large treebank to a language with a small or nonexistenttreebank, by sharing parameters between languages in the parser itself. Weexperiment with a diverse selection of languages in both simulated and trulylow-resource scenarios, and show that multilingual CWRs greatly facilitatelow-resource dependency parsing even without crosslingual supervision such asdictionaries or parallel text. Furthermore, we examine the non-contextual partof the learned language models (which we call a "decontextual probe") todemonstrate that polyglot language models better encode crosslingual lexicalcorrespondence compared to aligned monolingual language models. This analysisprovides further evidence that polyglot training is an effective approach tocrosslingual transfer.