Capabilities of Gemini Models in Medicine

  • 2024-05-01 18:12:10
  • Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby, Nenad Tomasev, Jan Freyberg, Charles Lau, Jonas Kemp, Jeremy Lai, Shekoofeh Azizi, Kimberly Kanada, SiWai Man, Kavita Kulkarni, Ruoxi Sun, Siamak Shakeri, Luheng He, Ben Caine, Albert Webson, Natasha Latysheva, Melvin Johnson, Philip Mansfield, Jian Lu, Ehud Rivlin, Jesper Anderson, Bradley Green, Renee Wong, Jonathan Krause, Jonathon Shlens, Ewa Dominowska, S. M. Ali Eslami, Katherine Chou, Claire Cui, Oriol Vinyals, Koray Kavu
  • 0

Abstract

Excellence in a wide variety of medical applications poses considerablechallenges for AI, requiring advanced reasoning, access to up-to-date medicalknowledge and understanding of complex multimodal data. Gemini models, withstrong general capabilities in multimodal and long-context reasoning, offerexciting possibilities in medicine. Building on these core strengths of Gemini,we introduce Med-Gemini, a family of highly capable multimodal models that arespecialized in medicine with the ability to seamlessly use web search, and thatcan be efficiently tailored to novel modalities using custom encoders. Weevaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art(SoTA) performance on 10 of them, and surpass the GPT-4 model family on everybenchmark where a direct comparison is viable, often by a wide margin. On thepopular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achievesSoTA performance of 91.1% accuracy, using a novel uncertainty-guided searchstrategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU(health & medicine), Med-Gemini improves over GPT-4V by an average relativemargin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-contextcapabilities through SoTA performance on a needle-in-a-haystack retrieval taskfrom long de-identified health records and medical video question answering,surpassing prior bespoke methods using only in-context learning. Finally,Med-Gemini's performance suggests real-world utility by surpassing humanexperts on tasks such as medical text summarization, alongside demonstrationsof promising potential for multimodal medical dialogue, medical research andeducation. Taken together, our results offer compelling evidence forMed-Gemini's potential, although further rigorous evaluation will be crucialbefore real-world deployment in this safety-critical domain.

 

Quick Read (beta)

loading the full paper ...