WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Abstract

Large language models (LLMs) have shown remarkable potential as autonomousagents, particularly in web-based tasks. However, existing LLM web agentsheavily rely on expensive proprietary LLM APIs, while open LLMs lack thenecessary decision-making capabilities. This paper introduces WebRL, aself-evolving online curriculum reinforcement learning framework designed totrain high-performance web agents using open LLMs. WebRL addresses three keychallenges in building LLM web agents, including the scarcity of trainingtasks, sparse feedback signals, and policy distribution drift in onlinelearning. Specifically, WebRL incorporates 1) a self-evolving curriculum thatgenerates new tasks from unsuccessful attempts, 2) a robust outcome-supervisedreward model (ORM), and 3) adaptive reinforcement learning strategies to ensureconsistent improvements. We apply WebRL to transform open Llama-3.1 and GLM-4models into proficient web agents. On WebArena-Lite, WebRL improves the successrate of Llama-3.1-8B from 4.8% to 42.4%, and from 6.1% to 43% for GLM-4-9B.These open models significantly surpass the performance of GPT-4-Turbo (17.6%)and GPT-4o (13.9%) and outperform previous state-of-the-art web agents trainedon open LLMs (AutoWebGLM, 18.2%). Our findings demonstrate WebRL'seffectiveness in bridging the gap between open and proprietary LLM-based webagents, paving the way for more accessible and powerful autonomous webinteraction systems.

Quick Read (beta)

loading the full paper ...