The first open machine translation system for the Chechen language

Abstract

We introduce the first open-source model for translation between thevulnerable Chechen language and Russian, and the dataset collected to train andevaluate it. We explore fine-tuning capabilities for including a new languageinto a large language model system for multilingual translation NLLB-200. TheBLEU / ChrF++ scores for our model are 8.34 / 34.69 and 20.89 / 44.55 fortranslation from Russian to Chechen and reverse direction, respectively. Therelease of the translation models is accompanied by the distribution ofparallel words, phrases and sentences corpora and multilingual sentence encoderadapted to the Chechen language.

Quick Read (beta)

loading the full paper ...