Multilingual Constituency Parsing with Self-Attention and Pre-Training

Abstract

We extend our previous work on constituency parsing (Kitaev and Klein, 2018)by incorporating pre-training for ten additional languages, and compare thebenefits of no pre-training, ELMo (Peters et al., 2018), and BERT (Devlin etal., 2018). Pre-training is effective across all languages evaluated, and BERToutperforms ELMo in large part due to the benefits of increased model capacity.Our parser obtains new state-of-the-art results for 11 languages, includingEnglish (95.8 F1) and Chinese (91.8 F1).

Quick Read (beta)

loading the full paper ...