Abstract
Excellence in a wide variety of medical applications poses considerablechallenges for AI, requiring advanced reasoning, access to up-to-date medicalknowledge and understanding of complex multimodal data. Gemini models, withstrong general capabilities in multimodal and long-context reasoning, offerexciting possibilities in medicine. Building on these core strengths of Gemini,we introduce Med-Gemini, a family of highly capable multimodal models that arespecialized in medicine with the ability to seamlessly use web search, and thatcan be efficiently tailored to novel modalities using custom encoders. Weevaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art(SoTA) performance on 10 of them, and surpass the GPT-4 model family on everybenchmark where a direct comparison is viable, often by a wide margin. On thepopular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achievesSoTA performance of 91.1% accuracy, using a novel uncertainty-guided searchstrategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU(health & medicine), Med-Gemini improves over GPT-4V by an average relativemargin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-contextcapabilities through SoTA performance on a needle-in-a-haystack retrieval taskfrom long de-identified health records and medical video question answering,surpassing prior bespoke methods using only in-context learning. Finally,Med-Gemini's performance suggests real-world utility by surpassing humanexperts on tasks such as medical text summarization, alongside demonstrationsof promising potential for multimodal medical dialogue, medical research andeducation. Taken together, our results offer compelling evidence forMed-Gemini's potential, although further rigorous evaluation will be crucialbefore real-world deployment in this safety-critical domain.