Abstract
Effective human-AI decision-making balances three key factors: the\textit{correctness} of predictions, the \textit{cost} of knowledge andreasoning complexity, and the confidence about whether to \textit{abstain}automated answers or involve human experts. In this work, we present a cascadedLLM decision framework that adaptively delegates tasks across multiple tiers ofexpertise -- a base model for initial candidate answers, a more capable andknowledgeable (but costlier) large model, and a human expert for when the modelcascade abstains. Our method proceeds in two stages. First, a deferral policydetermines whether to accept the base model's answer or regenerate it with thelarge model based on the confidence score. Second, an abstention policy decideswhether the cascade model response is sufficiently certain or requires humanintervention. Moreover, we incorporate an online learning mechanism in theframework that can leverage human feedback to improve decision quality overtime. We demonstrate this approach to general question-answering (ARC-Easy andARC-Challenge) and medical question-answering (MedQA and MedMCQA). Our resultsshow that our cascaded strategy outperforms in most cases single-modelbaselines in accuracy while reducing cost and providing a principled way tohandle abstentions.