GTAlign: Game-Theoretic Alignment of LLM Assistants for Social Welfare

  • 2025-11-03 18:54:17
  • Siqi Zhu, David Zhang, Pedro Cisneros-Velarde, Jiaxuan You
  • 0

Abstract

Large Language Models (LLMs) have achieved remarkable progress in reasoning,yet sometimes produce responses that are suboptimal for users in tasks such aswriting, information seeking, or providing practical guidance. Conventionalalignment practices typically assume that maximizing model reward alsomaximizes user welfare, but this assumption frequently fails in practice:models may over-clarify or generate overly verbose reasoning when users preferconcise answers. Such behaviors resemble the prisoner's dilemma, whereindividually rational choices lead to socially suboptimal outcomes. Thefundamental challenge is the lack of a principled decision making mechanismthat mutually benefits both the LLM and the user. We propose Game-TheoreticAlignment (GTAlign), an alignment framework that integrates game-theoreticdecision making into both reasoning and training. During reasoning, the modelexplicitly treats user-LLM interaction as a strategic game: it constructspayoff matrices within its reasoning chain to estimate welfare for both itselfand the user, and then selects actions that are mutually beneficial. Duringtraining, we introduce a social welfare reward that reinforces cooperativeresponses, aligning model behavior with socially efficient outcomes. Inaddition, we introduce an inference technique that leverages game-theoreticreasoning to dynamically adapt LLM's response when pricing policies of LLMservice change. Extensive experiments demonstrate that GTAlign substantiallyimproves reasoning efficiency, answer quality, and social welfare compared tobaselines across diverse tasks. The code is available athttps://github.com/ulab-uiuc/GTAlign .

 

Quick Read (beta)

loading the full paper ...