Marrying Universal Dependencies and Universal Morphology

  • 2018-10-15 23:00:13
  • Arya D. McCarthy, Miikka Silfverberg, Ryan Cotterell, Mans Hulden, David Yarowsky
  • 3

Abstract

The Universal Dependencies (UD) and Universal Morphology (UniMorph) projectseach present schemata for annotating the morphosyntactic details of language.Each project also provides corpora of annotated text in many languages - UD atthe token level and UniMorph at the type level. As each corpus is built bydifferent annotators, language-specific decisions hinder the goal of universalschemata. With compatibility of tags, each project's annotations could be usedto validate the other's. Additionally, the availability of both type- andtoken-level resources would be a boon to tasks such as parsing and homographdisambiguation. To ease this interoperability, we present a deterministicmapping from Universal Dependencies v2 features into the UniMorph schema. Wevalidate our approach by lookup in the UniMorph corpora and find amacro-average of 64.13% recall. We also note incompatibilities due to paucityof data on either side. Finally, we present a critical evaluation of thefoundations, strengths, and weaknesses of the two annotation projects.

 

Quick Read (beta)

loading the full paper ...