Explore the Reasoning Capability of LLMs in the Chess Testbed

Abstract

Reasoning is a central capability of human intelligence. In recent years,with the advent of large-scale datasets, pretrained large language models haveemerged with new capabilities, including reasoning. However, these models stillstruggle with long-term, complex reasoning tasks, such as playing chess. Basedon the observation that expert chess players employ a dual approach combininglong-term strategic play with short-term tactical play along with languageexplanation, we propose improving the reasoning capability of large languagemodels in chess by integrating annotated strategy and tactic. Specifically, wecollect a dataset named MATE, which consists of 1 million chess positions withcandidate moves annotated by chess experts for strategy and tactics. Wefinetune the LLaMA-3-8B model and compare it against state-of-the-artcommercial language models in the task of selecting better chess moves. Ourexperiments show that our models perform better than GPT, Claude, and Geminimodels. We find that language explanations can enhance the reasoning capabilityof large language models.

Quick Read (beta)

loading the full paper ...