ChatGPT MT: Competitive for High- (but not Low-) Resource Languages

Abstract

Large language models (LLMs) implicitly learn to perform a range of languagetasks, including machine translation (MT). Previous studies explore aspects ofLLMs' MT capabilities. However, there exist a wide variety of languages forwhich recent LLM MT performance has never before been evaluated. Withoutpublished experimental evidence on the matter, it is difficult for speakers ofthe world's diverse languages to know how and whether they can use LLMs fortheir languages. We present the first experimental evidence for an expansiveset of 204 languages, along with MT cost analysis, using the FLORES-200benchmark. Trends reveal that GPT models approach or exceed traditional MTmodel performance for some high-resource languages (HRLs) but consistently lagfor low-resource languages (LRLs), under-performing traditional MT for 84.1% oflanguages we covered. Our analysis reveals that a language's resource level isthe most important feature in determining ChatGPT's relative ability totranslate it, and suggests that ChatGPT is especially disadvantaged for LRLsand African languages.

Quick Read (beta)

loading the full paper ...