SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection

  • 2020-06-20 13:24:14
  • Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden
A broad goal in natural language processing (NLP) is to develop a system thathas the capacity to process any natural language. Most systems, however, aredeveloped using data from just one language such as English. The SIGMORPHON2020 shared task on morphological reinflection aims to investigate systems'ability to generalize across typologically distinct languages, many of whichare low resource. Systems were developed using data from 45 languages and just5 language families, fine-tuned with data from an additional 45 languages and10 language families (13 in total), and evaluated on all 90 languages. A totalof 22 systems (19 neural) from 10 teams were submitted to the task. All fourwinning systems were neural (two monolingual transformers and two massivelymultilingual RNN-based models with gated attention). Most teams demonstrateutility of data hallucination and augmentation, ensembles, and multilingualtraining for low-resource languages. Non-neural learners and manually designedgrammars showed competitive and even superior performance on some languages(such as Ingrian, Tajik, Tagalog, Zarma, Lingala), especially with very limiteddata. Some language families (Afro-Asiatic, Niger-Congo, Turkic) wererelatively easy for most systems and achieved over 90% mean accuracy whileothers were more challenging.


