MedResearcher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework

Abstract

Recent developments in Large Language Model (LLM)-based agents have shownimpressive capabilities spanning multiple domains, exemplified by deep researchsystems that demonstrate superior performance on complex information-seekingand synthesis tasks. While general-purpose deep research agents have shownimpressive capabilities, they struggle significantly with medical domainchallenges, as evidenced by leading proprietary systems achieving limitedaccuracy on complex medical benchmarks. The key limitations are: (1) the modellacks sufficient dense medical knowledge for clinical reasoning, and (2) theframework is constrained by the absence of specialized retrieval tools tailoredfor medical contexts. We present a medical deep research agent that addressesthese challenges through two core innovations. First, we develop a novel datasynthesis framework using medical knowledge graphs, extracting the longestchains from subgraphs around rare medical entities to generate complexmulti-hop question-answer pairs. Second, we integrate a custom-built privatemedical retrieval engine alongside general-purpose tools, enabling accuratemedical information synthesis. Our approach generates 2100+ diversetrajectories across 12 medical specialties, each averaging 4.2 toolinteractions. Through a two-stage training paradigm combining supervisedfine-tuning and online reinforcement learning with composite rewards, ourMedResearcher-R1-32B model demonstrates exceptional performance, establishingnew state-of-the-art results on medical benchmarks while maintainingcompetitive performance on general deep research tasks. Our work demonstratesthat strategic domain-specific innovations in architecture, tool design, andtraining data construction can enable smaller open-source models to outperformmuch larger proprietary systems in specialized domains.

Quick Read (beta)

loading the full paper ...