Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation

Abstract

This paper describes our system (HIT-SCIR) submitted to the CoNLL 2018 sharedtask on Multilingual Parsing from Raw Text to Universal Dependencies. We baseour submission on Stanford's winning system for the CoNLL 2017 shared task andmake two effective extensions: 1) incorporating deep contextualized wordembeddings into both the part of speech tagger and parser; 2) ensemblingparsers trained with different initialization. We also explore different waysof concatenating treebanks for further improvements. Experimental results onthe development data show the effectiveness of our methods. In the finalevaluation, our system was ranked first according to LAS (75.84%) andoutperformed the other systems by a large margin.

Quick Read (beta)

loading the full paper ...