Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models

  • 2025-06-18 14:21:13
  • Chenchen Yuan, Zheyu Zhang, Shuo Yang, Bardh Prenkaj, Gjergji Kasneci
  • 0

Abstract

Large Language Models (LLMs) have shown impressive moral reasoning abilities.Yet they often diverge when confronted with complex, multi-factor moraldilemmas. To address these discrepancies, we propose a framework thatsynthesizes multiple LLMs' moral judgments into a collectively formulated moraljudgment, realigning models that deviate significantly from this consensus. Ouraggregation mechanism fuses continuous moral acceptability scores (beyondbinary labels) into a collective probability, weighting contributions by modelreliability. For misaligned models, a targeted embedding-optimization procedurefine-tunes token embeddings for moral philosophical theories, minimizing JSdivergence to the consensus while preserving semantic integrity. Experiments ona large-scale social moral dilemma dataset show our approach builds robustconsensus and improves individual model fidelity. These findings highlight thevalue of data-driven moral alignment across multiple models and its potentialfor safer, more consistent AI systems.

 

Quick Read (beta)

loading the full paper ...