InSTA: Towards Internet-Scale Training For Agents

Abstract

The predominant approach for training web navigation agents is to gatherhuman demonstrations for a set of popular websites and hand-written tasks, butit is becoming clear that human data is an inefficient resource. We develop apipeline to facilitate internet-scale training for agents without laborioushuman annotations. In the first stage, an LLM annotates 150k sites with agentictasks. In the next stage, LLM agents complete tasks and produce trajectories.In the final stage, an LLM filters trajectories by judging their success.Language models are powerful data curation tools, identifying harmful contentwith an accuracy of 97%, judging successful trajectories with an accuracy of82.6%, and producing effective data. We train agents based on Qwen 3 1.7B thatare competitive with frontier LLMs as web agents, while being smaller andfaster. Our top agent reaches a success rate of 56.9%, outperforming the datacollection policy Qwen 3 235B, a 235 times larger Llama 4 Maverick, andreaching 94.7% of the performance of Gemini 2.5 Flash. We are releasing code,models and data at: https://data-for-agents.github.io.

Quick Read (beta)

loading the full paper ...