A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

Abstract

The rapid development of Large Language Models (LLMs) demonstrates remarkablemultilingual capabilities in natural language processing, attracting globalattention in both academia and industry. To mitigate potential discriminationand enhance the overall usability and accessibility for diverse language usergroups, it is important for the development of language-fair technology.Despite the breakthroughs of LLMs, the investigation into the multilingualscenario remains insufficient, where a comprehensive survey to summarize recentapproaches, developments, limitations, and potential solutions is desirable. Tothis end, we provide a survey with multiple perspectives on the utilization ofLLMs in the multilingual scenario. We first rethink the transitions betweenprevious and current research on pre-trained language models. Then we introduceseveral perspectives on the multilingualism of LLMs, including training andinference methods, model security, multi-domain with language culture, andusage of datasets. We also discuss the major challenges that arise in theseaspects, along with possible solutions. Besides, we highlight future researchdirections that aim at further enhancing LLMs with multilingualism. The surveyaims to help the research community address multilingual problems and provide acomprehensive understanding of the core concepts, key techniques, and latestdevelopments in multilingual natural language processing based on LLMs.

Quick Read (beta)

loading the full paper ...