Abstract
Visual-language Chain-of-Thought (CoT) data resources are relatively scarcecompared to text-only counterparts, limiting the improvement of reasoningcapabilities in Vision Language Models (VLMs). However, high-qualityvision-language reasoning data is expensive and labor-intensive to annotate. Toaddress this issue, we leverage a promising resource: game code, whichnaturally contains logical structures and state transition processes.Therefore, we propose Code2Logic, a novel game-code-driven approach formultimodal reasoning data synthesis. Our approach leverages Large LanguageModels (LLMs) to adapt game code, enabling automatic acquisition of reasoningprocesses and results through code execution. Using the Code2Logic approach, wedeveloped the GameQA dataset to train and evaluate VLMs. GameQA iscost-effective and scalable, offers controllable difficulty gradation and isdiverse with 30 games and 158 tasks. Surprisingly, despite training solely ongame data, VLMs demonstrated out of domain generalization, specificallyQwen2.5-VL-7B improving performance by 2.33% across 7 diverse vision-languagebenchmarks. Our code, dataset and models are available athttps://github.com/tongjingqi/Code2Logic.