MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind

Abstract

Large Language Model (LLM) agents have demonstrated impressive capabilitiesin social deduction games (SDGs) like Werewolf, where strategic reasoning andsocial deception are essential. However, current approaches remain limited totextual information, ignoring crucial multimodal cues such as facialexpressions and tone of voice that humans naturally use to communicate.Moreover, existing SDG agents primarily focus on inferring other players'identities without modeling how others perceive themselves or fellow players.To address these limitations, we use One Night Ultimate Werewolf (ONUW) as atestbed and present MultiMind, the first framework integrating multimodalinformation into SDG agents. MultiMind processes facial expressions and vocaltones alongside verbal content, while employing a Theory of Mind (ToM) model torepresent each player's suspicion levels toward others. By combining this ToMmodel with Monte Carlo Tree Search (MCTS), our agent identifies communicationstrategies that minimize suspicion directed at itself. Through comprehensiveevaluation in both agent-versus-agent simulations and studies with humanplayers, we demonstrate MultiMind's superior performance in gameplay. Our workpresents a significant advancement toward LLM agents capable of human-likesocial reasoning across multimodal domains.

Quick Read (beta)

loading the full paper ...