Abstract
Current evaluations of agents remain centered around one-shot taskcompletion, failing to account for the inherently iterative and collaborativenature of many real-world problems, where human goals are often underspecifiedand evolve. We argue for a shift from building and assessing task completionagents to developing collaborative agents, assessed not only by the quality oftheir final outputs but by how well they engage with and enhance human effortthroughout the problem-solving process. To support this shift, we introducecollaborative effort scaling, a framework that captures how an agent's utilitygrows with increasing user involvement. Through case studies and simulatedevaluations, we show that state-of-the-art agents often underperform inmulti-turn, real-world scenarios, revealing a missing ingredient in agentdesign: the ability to sustain engagement and scaffold user understanding.Collaborative effort scaling offers a lens for diagnosing agent behavior andguiding development toward more effective interactions.