Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences

  • 2024-07-17 15:46:37
  • Claudio Pinhanez, Paulo Cavalin, Luciana Storto, Thomas Fimbow, Alexander Cobbinah, Julio Nogima, Marisa Vasconcelos, Pedro Domingues, Priscila de Souza Mizukami, Nicole Grell, Majoí Gongora, Isabel Gonçalves
  • 0

Abstract

Since 2022 we have been exploring application areas and technologies in whichArtificial Intelligence (AI) and modern Natural Language Processing (NLP), suchas Large Language Models (LLMs), can be employed to foster the usage andfacilitate the documentation of Indigenous languages which are in danger ofdisappearing. We start by discussing the decreasing diversity of languages inthe world and how working with Indigenous languages poses unique ethicalchallenges for AI and NLP. To address those challenges, we propose analternative development AI cycle based on community engagement and usage. Then,we report encouraging results in the development of high-quality machinelearning translators for Indigenous languages by fine-tuning state-of-the-art(SOTA) translators with tiny amounts of data and discuss how to avoid somecommon pitfalls in the process. We also present prototypes we have built inprojects done in 2023 and 2024 with Indigenous communities in Brazil, aimed atfacilitating writing, and discuss the development of Indigenous Language Models(ILMs) as a replicable and scalable way to create spell-checkers, next-wordpredictors, and similar tools. Finally, we discuss how we envision a future forlanguage documentation where dying languages are preserved as interactivelanguage models.

 

Quick Read (beta)

loading the full paper ...