ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge

Abstract

While large language models (LLMs) have achieved impressive progress, theirapplication in scientific domains such as chemistry remains hindered by shallowdomain understanding and limited reasoning capabilities. In this work, we focuson the specific field of chemistry and develop a Chemical Reasoner LLM,ChemDFM-R. We first construct a comprehensive dataset of atomized knowledgepoints to enhance the model's understanding of the fundamental principles andlogical structure of chemistry. Then, we propose a mix-sourced distillationstrategy that integrates expert-curated knowledge with general-domain reasoningskills, followed by domain-specific reinforcement learning to enhance chemicalreasoning. Experiments on diverse chemical benchmarks demonstrate thatChemDFM-R achieves state-of-the-art performance while providing interpretable,rationale-driven outputs. Further case studies illustrate how explicitreasoning chains significantly improve the reliability, transparency, andpractical utility of the model in real-world human-AI collaboration scenarios.

Quick Read (beta)

loading the full paper ...