Abstract
Cloud-based mobile agents powered by (multimodal) large language models((M)LLMs) offer strong reasoning abilities but suffer from high latency andcost. While fine-tuned (M)SLMs enable edge deployment, they often lose generalcapabilities and struggle with complex tasks. To address this, we proposeEcoAgent, an Edge-Cloud cOllaborative multi-agent framework for mobileautomation. EcoAgent features a closed-loop collaboration among a cloud-basedPlanning Agent and two edge-based agents: the Execution Agent for actionexecution and the Observation Agent for verifying outcomes. The ObservationAgent uses a Pre-Understanding Module to compress screen images into concisetext, reducing token usage. In case of failure, the Planning Agent retrievesscreen history and replans via a Reflection Module. Experiments on AndroidWorldshow that EcoAgent maintains high task success rates while significantlyreducing MLLM token consumption, enabling efficient and practical mobileautomation.