Abstract
Being able to effectively read scientific plots, or chart understanding, is acentral part toward building effective agents for science. However, existingmultimodal large language models (MLLMs), especially open-source ones, arestill falling behind with a typical success rate of 30%-50% on challengingbenchmarks. Previous studies on fine-tuning MLLMs with synthetic charts areoften restricted by their inadequate similarity to the real charts, which couldcompromise model training and performance on complex real-world charts. In thisstudy, we show that modularizing chart generation and diversifying visualdetails improves chart understanding capabilities. In particular, we design afive-step data synthesis pipeline, where we separate data and function creationfor single plot generation, condition the generation of later subplots onearlier ones for multi-subplot figures, visually diversify the generatedfigures, filter out low quality data, and finally generate the question-answer(QA) pairs with GPT-4o. This approach allows us to streamline the generation offine-tuning datasets and introduce the effective chart dataset (ECD), whichcontains 10k+ chart images and 300k+ QA pairs, covering 25 topics and featuring250+ chart type combinations with high visual complexity. We show that ECDconsistently improves the performance of various MLLMs on a range of real-worldand synthetic test sets. Code, data and models are available at:https://github.com/yuweiyang-anu/ECD.