MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents

Abstract

Building general-purpose graphical user interface (GUI) agents has becomeincreasingly promising with the progress in vision language models. However,developing effective mobile GUI agents with reinforcement learning (RL) remainschallenging due to the heavy-tailed distribution of task difficulty and theinefficiency of large-scale environment sampling. We present an online agenticreinforcement learning framework MobileRL to enhance GUI agents in mobileenvironments. Its core component is the Difficulty-ADAptive GRPO (ADAGRPO)algorithm. In ADAGRPO, we design difficulty-adaptive positive replay andfailure curriculum filtering to adapt the model to different task difficulties.We introduce the shortest-path reward adjustment strategy to reshape rewardsconcerning the task length in multi-turn agentic tasks. Those strategiesjointly stabilize RL training, improve sample efficiency, and generate strongperformance across diverse mobile apps and tasks. We apply MOBILERL to two openmodels (Qwen2.5-VL-7B-Instruct and GLM-4.1V-9B-Base). The resultant MOBILERL-9Bmodel achieves state-of-the-art results in terms of success rates on bothAndroidWorld (80.2%) and AndroidLab (53.6%). The MOBILERL framework isopen-sourced at: https://github.com/THUDM/MobileRL.

Quick Read (beta)

loading the full paper ...