Abstract
Chest X-rays (CXRs) are the most frequently performed imaging examinations inclinical settings. Recent advancements in Large Multimodal Models (LMMs) haveenabled automated CXR interpretation, enhancing diagnostic accuracy andefficiency. However, despite their strong visual understanding, current MedicalLMMs (MLMMs) still face two major challenges: (1) Insufficient region-levelunderstanding and interaction, and (2) Limited accuracy and interpretabilitydue to single-step reasoning. In this paper, we empower MLMMs withanatomy-centric reasoning capabilities to enhance their interactivity andexplainability. Specifically, we first propose an Anatomical Ontology-GuidedReasoning (AOR) framework, which centers on cross-modal region-levelinformation to facilitate multi-step reasoning. Next, under the guidance ofexpert physicians, we develop AOR-Instruction, a large instruction dataset forMLMMs training. Our experiments demonstrate AOR's superior performance in bothVQA and report generation tasks.