Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game

Abstract

Achieving Artificial General Intelligence (AGI) requires AI agents that cannot only make stratigic decisions but also engage in flexible and meaningfulcommunication. Inspired by Wittgenstein's language game theory in PhilosophicalInvestigations, we propose that language agents can learn through in-contextinteraction rather than traditional multi-stage frameworks that separatedecision-making from language expression. Using Werewolf, a social deductiongame that tests language understanding, strategic interaction, andadaptability, we develop the Multi-agent Kahneman & Tversky's Optimization(MaKTO). MaKTO engages diverse models in extensive gameplay to generateunpaired desirable and unacceptable responses, then employs KTO to refine themodel's decision-making process. In 9-player Werewolf games, MaKTO achieves a61% average win rate across various models, outperforming GPT-4o and two-stageRL agents by relative improvements of 23.0% and 10.9%, respectively. Notably,MaKTO also demonstrates human-like performance, winning 60% against expertplayers and showing only 49% detectability in Turing-style blind tests. Theseresults showcase MaKTO's superior decision-making, strategic adaptation, andnatural language generation in complex social deduction games.

Quick Read (beta)

loading the full paper ...