SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model

Abstract

AI agents built on large language models (LLMs) hold enormous promise, butcurrent practice focuses on a one-task-one-agent approach, which not only fallsshort of scalability and generality, but also suffers from the fundamentallimitations of autoregressive LLMs. On the other hand, humans are generalagents who reason by mentally simulating the outcomes of their actions andplans. Moving towards a more general and powerful AI agent, we introduceSimuRA, a goal-oriented architecture for generalized agentic reasoning. Basedon a principled formulation of optimal agent in any environment, \modelnameovercomes the limitations of autoregressive reasoning by introducing a worldmodel for planning via simulation. The generalized world model is implementedusing LLM, which can flexibly plan in a wide range of environments using theconcept-rich latent space of natural language. Experiments on difficult webbrowsing tasks show that \modelname improves the success of flight search from0\% to 32.2\%. World-model-based planning, in particular, shows consistentadvantage of up to 124\% over autoregressive planning, demonstrating theadvantage of world model simulation as a reasoning paradigm. We are excitedabout the possibility for training a single, general agent model based on LLMsthat can act superintelligently in all environments. To start, we make SimuRA,a web-browsing agent built on \modelname with pretrained LLMs, available as aresearch demo for public testing.

Quick Read (beta)

loading the full paper ...