Abstract
Recent work has demonstrated the promise of conversational AI systems fordiagnostic dialogue. However, real-world assurance of patient safety means thatproviding individual diagnoses and treatment plans is considered a regulatedactivity by licensed professionals. Furthermore, physicians commonly overseeother team members in such activities, including nurse practitioners (NPs) orphysician assistants/associates (PAs). Inspired by this, we propose a frameworkfor effective, asynchronous oversight of the Articulate Medical IntelligenceExplorer (AMIE) AI system. We propose guardrailed-AMIE (g-AMIE), a multi-agentsystem that performs history taking within guardrails, abstaining fromindividualized medical advice. Afterwards, g-AMIE conveys assessments to anoverseeing primary care physician (PCP) in a clinician cockpit interface. ThePCP provides oversight and retains accountability of the clinical decision.This effectively decouples oversight from intake and can thus happenasynchronously. In a randomized, blinded virtual Objective Structured ClinicalExamination (OSCE) of text consultations with asynchronous oversight, wecompared g-AMIE to NPs/PAs or a group of PCPs under the same guardrails. Across60 scenarios, g-AMIE outperformed both groups in performing high-qualityintake, summarizing cases, and proposing diagnoses and management plans for theoverseeing PCP to review. This resulted in higher quality composite decisions.PCP oversight of g-AMIE was also more time-efficient than standalone PCPconsultations in prior work. While our study does not replicate existingclinical practices and likely underestimates clinicians' capabilities, ourresults demonstrate the promise of asynchronous oversight as a feasibleparadigm for diagnostic AI systems to operate under expert human oversight forenhancing real-world care.