End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning

  • 2025-08-21 17:42:47
  • Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Yanfeng Wang, Ya Zhang, Weidi Xie
  • 0

Abstract

Accurate diagnosis with medical large language models is hindered byknowledge gaps and hallucinations. Retrieval and tool-augmented methods help,but their impact is limited by weak use of external knowledge and poorfeedback-reasoning traceability. To address these challenges, We introduceDeep-DxSearch, an agentic RAG system trained end-to-end with reinforcementlearning (RL) that enables steer tracebale retrieval-augmented reasoning formedical diagnosis. In Deep-DxSearch, we first construct a large-scale medicalretrieval corpus comprising patient records and reliable medical knowledgesources to support retrieval-aware reasoning across diagnostic scenarios. Morecrutially, we frame the LLM as the core agent and the retrieval corpus as itsenvironment, using tailored rewards on format, retrieval, reasoning structure,and diagnostic accuracy, thereby evolving the agentic RAG policy fromlarge-scale data through RL. Experiments demonstrate that our end-to-end agentic RL training frameworkconsistently outperforms prompt-engineering and training-free RAG approachesacross multiple data centers. After training, Deep-DxSearch achievessubstantial gains in diagnostic accuracy, surpassing strong diagnosticbaselines such as GPT-4o, DeepSeek-R1, and other medical-specific frameworksfor both common and rare disease diagnosis under in-distribution andout-of-distribution settings. Moreover, ablation studies on reward design andretrieval corpus components confirm their critical roles, underscoring theuniqueness and effectiveness of our approach compared with traditionalimplementations. Finally, case studies and interpretability analyses highlightimprovements in Deep-DxSearch's diagnostic policy, providing deeper insightinto its performance gains and supporting clinicians in delivering morereliable and precise preliminary diagnoses. Seehttps://github.com/MAGIC-AI4Med/Deep-DxSearch.

 

Quick Read (beta)

loading the full paper ...