Towards physician-centered oversight of conversational diagnostic AI

  • 2025-07-21 15:54:36
  • Elahe Vedadi, David Barrett, Natalie Harris, Ellery Wulczyn, Shashir Reddy, Roma Ruparel, Mike Schaekermann, Tim Strother, Ryutaro Tanno, Yash Sharma, Jihyeon Lee, Cían Hughes, Dylan Slack, Anil Palepu, Jan Freyberg, Khaled Saab, Valentin Liévin, Wei-Hung Weng, Tao Tu, Yun Liu, Nenad Tomasev, Kavita Kulkarni, S. Sara Mahdavi, Kelvin Guu, Joëlle Barral, Dale R. Webster, James Manyika, Avinatan Hassidim, Katherine Chou, Yossi Matias, Pushmeet Kohli, Adam Rodman, Vivek Natarajan, Alan Karthikesalingam, David Stutz
  • 0

Abstract

Recent work has demonstrated the promise of conversational AI systems fordiagnostic dialogue. However, real-world assurance of patient safety means thatproviding individual diagnoses and treatment plans is considered a regulatedactivity by licensed professionals. Furthermore, physicians commonly overseeother team members in such activities, including nurse practitioners (NPs) orphysician assistants/associates (PAs). Inspired by this, we propose a frameworkfor effective, asynchronous oversight of the Articulate Medical IntelligenceExplorer (AMIE) AI system. We propose guardrailed-AMIE (g-AMIE), a multi-agentsystem that performs history taking within guardrails, abstaining fromindividualized medical advice. Afterwards, g-AMIE conveys assessments to anoverseeing primary care physician (PCP) in a clinician cockpit interface. ThePCP provides oversight and retains accountability of the clinical decision.This effectively decouples oversight from intake and can thus happenasynchronously. In a randomized, blinded virtual Objective Structured ClinicalExamination (OSCE) of text consultations with asynchronous oversight, wecompared g-AMIE to NPs/PAs or a group of PCPs under the same guardrails. Across60 scenarios, g-AMIE outperformed both groups in performing high-qualityintake, summarizing cases, and proposing diagnoses and management plans for theoverseeing PCP to review. This resulted in higher quality composite decisions.PCP oversight of g-AMIE was also more time-efficient than standalone PCPconsultations in prior work. While our study does not replicate existingclinical practices and likely underestimates clinicians' capabilities, ourresults demonstrate the promise of asynchronous oversight as a feasibleparadigm for diagnostic AI systems to operate under expert human oversight forenhancing real-world care.

 

Quick Read (beta)

loading the full paper ...