Low Resource Neural Machine Translation: A Benchmark for Five African Languages

  • 2020-03-31 17:50:07
  • Surafel M. Lakew, Matteo Negri, Marco Turchi
  • 0

Abstract

Recent advents in Neural Machine Translation (NMT) have shown improvements inlow-resource language (LRL) translation tasks. In this work, we benchmark NMTbetween English and five African LRL pairs (Swahili, Amharic, Tigrigna, Oromo,Somali [SATOS]). We collected the available resources on the SATOS languages toevaluate the current state of NMT for LRLs. Our evaluation, comparing abaseline single language pair NMT model against semi-supervised learning,transfer learning, and multilingual modeling, shows significant performanceimprovements both in the En-LRL and LRL-En directions. In terms of averagedBLEU score, the multilingual approach shows the largest gains, up to +5 points,in six out of ten translation directions. To demonstrate the generalizationcapability of each model, we also report results on multi-domain test sets. Werelease the standardized experimental data and the test sets for future worksaddressing the challenges of NMT in under-resourced settings, in particular forthe SATOS languages.

 

Quick Read (beta)

loading the full paper ...