ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Contrastive Framework

Abstract

Although fine-tuning Large Language Models (LLMs) with multilingual data canrapidly enhance the multilingual capabilities of LLMs, they still exhibit aperformance gap between the dominant language (e.g., English) and non-dominantones due to the imbalance of training data across languages. To further enhancethe performance of non-dominant languages, we propose ShifCon, a Shift-basedContrastive framework that aligns the internal forward process of otherlanguages toward that of the dominant one. Specifically, it shifts therepresentations of non-dominant languages into the dominant language subspace,allowing them to access relatively rich information encoded in the modelparameters. The enriched representations are then shifted back into theiroriginal language subspace before generation. Moreover, we introduce a subspacedistance metric to pinpoint the optimal layer area for shifting representationsand employ multilingual contrastive learning to further enhance the alignmentof representations within this area. Experiments demonstrate that our ShifConframework significantly enhances the performance of non-dominant languages,particularly for low-resource ones. Further analysis offers extra insights toverify the effectiveness of ShifCon and propel future research

Quick Read (beta)

loading the full paper ...