IndT5: A Text-to-Text Transformer for 10 Indigenous Languages

  • 2021-04-27 09:07:50
  • El Moatez Billah Nagoudi, Wei-Rui Chen, Muhammad Abdul-Mageed, Hasan Cavusogl
  • 0

Abstract

Transformer language models have become fundamental components of naturallanguage processing based pipelines. Although several Transformer models havebeen introduced to serve many languages, there is a shortage of modelspre-trained for low-resource and Indigenous languages. In this work, weintroduce IndT5, the first Transformer language model for Indigenous languages.To train IndT5, we build IndCorpus--a new dataset for ten Indigenous languagesand Spanish. We also present the application of IndT5 to machine translation byinvestigating different approaches to translate between Spanish and theIndigenous languages as part of our contribution to the AmericasNLP 2021 SharedTask on Open Machine Translation. IndT5 and IndCorpus are publicly availablefor research

 

Quick Read (beta)

loading the full paper ...