CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering

  • 2025-01-30 16:15:38
  • Yumeng Wang, Zhiyuan Fan, Qingyun Wang, May Fung, Heng Ji
  • 0

Abstract

Large Language Models (LLMs) are pretrained on extensive multilingual corporato acquire both language-specific cultural knowledge and general knowledge.Ideally, while LLMs should provide consistent responses to culture-independentquestions across languages, we observe significant performance disparities. Toaddress this, we explore the Cross-Lingual Self-Aligning ability of LanguageModels (CALM) to align knowledge across languages. Specifically, for a givenquestion, we sample multiple responses across different languages, and selectthe most self-consistent response as the target, leaving the remainingresponses as negative examples. We then employ direct preference optimization(DPO) to align the model's knowledge across different languages. Evaluations onthe MEDQA and X-CSQA datasets demonstrate CALM's effectiveness in enhancingcross-lingual knowledge question answering, both in zero-shot and retrievalaugmented settings. We also found that increasing the number of languagesinvolved in CALM training leads to even higher accuracy and consistency. Weoffer a qualitative analysis of how cross-lingual consistency can enhanceknowledge alignment and explore the method's generalizability. The source codeand data of this paper are available on GitHub.

 

Quick Read (beta)

loading the full paper ...