Towards Neural No-Resource Language Translation: A Comparative Evaluation of Approaches

Abstract

No-resource languages - those with minimal or no digital representation -pose unique challenges for machine translation (MT). Unlike low-resourcelanguages, which rely on limited but existent corpora, no-resource languagesoften have fewer than 100 sentences available for training. This work exploresthe problem of no-resource translation through three distinct workflows:fine-tuning of translation-specific models, in-context learning with largelanguage models (LLMs) using chain-of-reasoning prompting, and direct promptingwithout reasoning. Using Owens Valley Paiute as a case study, we demonstratethat no-resource translation demands fundamentally different approaches fromlow-resource scenarios, as traditional approaches to machine translation, suchas those that work for low-resource languages, fail. Empirical results revealthat, although traditional approaches fail, the in-context learningcapabilities of general-purpose large language models enable no-resourcelanguage translation that outperforms low-resource translation approaches andrivals human translations (BLEU 0.45-0.6); specifically, chain-of-reasoningprompting outperforms other methods for larger corpora, while direct promptingexhibits advantages in smaller datasets. As these approaches arelanguage-agnostic, they have potential to be generalized to translation tasksfrom a wide variety of no-resource languages without expert input. Thesefindings establish no-resource translation as a distinct paradigm requiringinnovative solutions, providing practical and theoretical insights for languagepreservation.

Quick Read (beta)

loading the full paper ...