Abstract
Recent advances in reinforcement learning with verifiable, rule-based rewardshave greatly enhanced the reasoning capabilities and out-of-distributiongeneralization of VLMs/LLMs, obviating the need for manually crafted reasoningchains. Despite these promising developments in the general domain, theirtranslation to medical imaging remains limited. Current medical reinforcementfine-tuning (RFT) methods predominantly focus on close-ended VQA, therebyrestricting the model's ability to engage in world knowledge retrieval andflexible task adaptation. More critically, these methods fall short ofaddressing the critical clinical demand for open-ended, reasoning-intensivedecision-making. To bridge this gap, we introduce \textbf{MedCCO}, the firstmultimodal reinforcement learning framework tailored for medical VQA thatunifies close-ended and open-ended data within a curriculum-driven RFTparadigm. Specifically, MedCCO is initially fine-tuned on a diverse set ofclose-ended medical VQA tasks to establish domain-grounded reasoningcapabilities, and is then progressively adapted to open-ended tasks to fosterdeeper knowledge enhancement and clinical interpretability. We validate MedCCOacross eight challenging medical VQA benchmarks, spanning both close-ended andopen-ended settings. Experimental results show that MedCCO consistentlyenhances performance and generalization, achieving a 11.4\% accuracy gainacross three in-domain tasks, and a 5.7\% improvement on five out-of-domainbenchmarks. These findings highlight the promise of curriculum-guided RL inadvancing robust, clinically-relevant reasoning in medical multimodal languagemodels.