On Language Models for Creoles

  • 2021-09-13 15:51:15
  • Heather Lent, Emanuele Bugliarello, Miryam de Lhoneux, Chen Qiu, Anders S√łgaard
  • 0


Creole languages such as Nigerian Pidgin English and Haitian Creole areunder-resourced and largely ignored in the NLP literature. Creoles typicallyresult from the fusion of a foreign language with multiple local languages, andwhat grammatical and lexical features are transferred to the creole is acomplex process. While creoles are generally stable, the prominence of somefeatures may be much stronger with certain demographics or in some linguisticsituations. This paper makes several contributions: We collect existing corporaand release models for Haitian Creole, Nigerian Pidgin English, and SingaporeanColloquial English. We evaluate these models on intrinsic and extrinsic tasks.Motivated by the above literature, we compare standard language models withdistributionally robust ones and find that, somewhat surprisingly, the standardlanguage models are superior to the distributionally robust ones. Weinvestigate whether this is an effect of over-parameterization or relativedistributional stability, and find that the difference persists in the absenceof over-parameterization, and that drift is limited, confirming the relativestability of creole languages.


