Game Reasoning Arena: A Framework and Benchmark for Assessing Reasoning Capabilities of Large Language Models via Game Play

Abstract

The Game Reasoning Arena library provides a framework for evaluating thedecision making abilities of large language models (LLMs) through strategicboard games implemented in Google OpenSpiel library. The framework enablessystematic comparisons between LLM based agents and other agents (random,heuristic, reinforcement learning agents, etc.) in various game scenarios bywrapping multiple board and matrix games and supporting different agent types.It integrates API access to models via liteLLM, local model deployment viavLLM, and offers distributed execution through Ray. This paper summarises thelibrary structure, key characteristics, and motivation of the repository,highlighting how it contributes to the empirical evaluation of the reasoning ofLLM and game theoretic behaviour.

Quick Read (beta)

loading the full paper ...