MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

Abstract

Multi-Modal Large Language Models (MLLMs), despite being successful, exhibitlimited generality and often fall short when compared to specialized models.Recently, LLM-based agents have been developed to address these challenges byselecting appropriate specialized models as tools based on user inputs.However, such advancements have not been extensively explored within themedical domain. To bridge this gap, this paper introduces the first agentexplicitly designed for the medical field, named \textbf{M}ulti-modal\textbf{Med}ical \textbf{Agent} (MMedAgent). We curate an instruction-tuningdataset comprising six medical tools solving seven tasks, enabling the agent tochoose the most suitable tools for a given task. Comprehensive experimentsdemonstrate that MMedAgent achieves superior performance across a variety ofmedical tasks compared to state-of-the-art open-source methods and even theclosed-source model, GPT-4o. Furthermore, MMedAgent exhibits efficiency inupdating and integrating new medical tools.

Quick Read (beta)

loading the full paper ...