Build the web for agents, not agents for the web

Abstract

Recent advancements in Large Language Models (LLMs) and multimodalcounterparts have spurred significant interest in developing web agents -- AIsystems capable of autonomously navigating and completing tasks within webenvironments. While holding tremendous promise for automating complex webinteractions, current approaches face substantial challenges due to thefundamental mismatch between human-designed interfaces and LLM capabilities.Current methods struggle with the inherent complexity of web inputs, whetherprocessing massive DOM trees, relying on screenshots augmented with additionalinformation, or bypassing the user interface entirely through API interactions.This position paper advocates for a paradigm shift in web agent research:rather than forcing web agents to adapt to interfaces designed for humans, weshould develop a new interaction paradigm specifically optimized for agenticcapabilities. To this end, we introduce the concept of an Agentic Web Interface(AWI), an interface specifically designed for agents to navigate a website. Weestablish six guiding principles for AWI design, emphasizing safety,efficiency, and standardization, to account for the interests of all primarystakeholders. This reframing aims to overcome fundamental limitations ofexisting interfaces, paving the way for more efficient, reliable, andtransparent web agent design, which will be a collaborative effort involvingthe broader ML community.

Quick Read (beta)

loading the full paper ...