Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability

Abstract

Recently, ChatGPT has emerged as a powerful NLP tool that can carry outseveral tasks. However, the range of languages ChatGPT can handle remainslargely a mystery. In this work, we investigate ChatGPT's languageidentification abilities. For this purpose, we compile Babel-670, a benchmarkcomprising $670$ languages representing $23$ language families. Languages inBabel-670 run the gamut between the very high-resource to the very low-resourceand are spoken in five continents. We then study ChatGPT's (both GPT-3.5 andGPT-4) ability to (i) identify both language names and language codes (ii)under both zero- and few-shot conditions (iii) with and without provision oflabel set. When compared to smaller finetuned language identification tools, wefind that ChatGPT lags behind. Our empirical analysis shows the reality thatChatGPT still resides in a state of potential enhancement before it cansufficiently serve diverse communities.

Quick Read (beta)

loading the full paper ...