Abstract
Large Language Models (LLMs) are pretrained on extensive multilingual corporato acquire both language-specific cultural knowledge and general knowledge.Ideally, while LLMs should provide consistent responses to culture-independentquestions across languages, we observe significant performance disparities. Toaddress this, we explore the Cross-Lingual Self-Aligning ability of LanguageModels (CALM) to align knowledge across languages. Specifically, for a givenquestion, we sample multiple responses across different languages and selectthe most self-consistent response as the target, leaving the remainingresponses as negative examples. We then employ direct preference optimization(DPO) to align the model's knowledge across different languages. Evaluations onthe MEDQA and X-CSQA datasets demonstrate CALM's effectiveness in enhancingcross-lingual knowledge question answering, both in zero-shot andretrieval-augmented settings. We also found that increasing the number oflanguages involved in CALM training leads to higher accuracy and consistency.We offer a qualitative analysis of how cross-lingual consistency can enhanceknowledge alignment and explore the method's generalizability.