DigiData: Training and Evaluating General-Purpose Mobile Control Agents

Abstract

AI agents capable of controlling user interfaces have the potential totransform human interaction with digital devices. To accelerate thistransformation, two fundamental building blocks are essential: high-qualitydatasets that enable agents to achieve complex and human-relevant goals, androbust evaluation methods that allow researchers and practitioners to rapidlyenhance agent performance. In this paper, we introduce DigiData, a large-scale,high-quality, diverse, multi-modal dataset designed for training mobile controlagents. Unlike existing datasets, which derive goals from unstructuredinteractions, DigiData is meticulously constructed through comprehensiveexploration of app features, resulting in greater diversity and higher goalcomplexity. Additionally, we present DigiData-Bench, a benchmark for evaluatingmobile control agents on real-world complex tasks. We demonstrate that thecommonly used step-accuracy metric falls short in reliably assessing mobilecontrol agents and, to address this, we propose dynamic evaluation protocolsand AI-powered evaluations as rigorous alternatives for agent assessment. Ourcontributions aim to significantly advance the development of mobile controlagents, paving the way for more intuitive and effective human-deviceinteractions.

Quick Read (beta)

loading the full paper ...