Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner

Abstract

Recently, inspired by OpenAI-o1/o3 and Deepseek-R1, the R1-Style method basedon reinforcement learning fine-tuning has received widespread attention fromthe community. Previous R1-Style methods mainly focus on mathematical reasoningand code intelligence. It is of great research significance to verify theiradvantages on more general multimodal data. Chart is an important multimodaldata type with rich information, which brings important research challenges incomplex reasoning. In this work, we introduce Chart-R1, a chart-domainvision-language model with reinforcement learning fine-tuning to enable complexchart reasoning. To support Chart-R1, we first propose a novel programmaticdata synthesis technology to generate high-quality step-by-step chart reasoningdata covering single- and multi-subcharts, which makes up for the lack ofreasoning data in the chart domain. Then we develop a two-stage trainingstrategy: Chart-COT with step-by-step chain-of-thought supervision, andChart-RFT with numerically sensitive reinforcement fine-tuning. Chart-COT aimsto decompose complex chart reasoning tasks into fine-grained, understandablesubtasks through step-by-step supervision, which lays a good foundation forimproving the reasoning level of reinforcement learning. Chart-RFT utilize thetypical group relative policy optimization strategy, in which a relatively softreward is adopted for numerical response to emphasize the numerical sensitivityin the chart domain. We conduct extensive experiments on open-source benchmarksand self-built chart reasoning dataset (\emph{i.e., ChartRQA}). Experimentalresults show that Chart-R1 has significant advantages compared to chart-domainmethods, even comparable to open/closed source large-scale models (\emph{e.g.,GPT-4o, Claude-3.5}).

Quick Read (beta)

loading the full paper ...