Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Abstract

Recent months have seen the emergence of a powerful new trend in which largelanguage models (LLMs) are augmented to become autonomous language agentscapable of performing objective oriented multi-step tasks on their own, ratherthan merely responding to queries from human users. Most existing languageagents, however, are not optimized using environment-specific rewards. Althoughsome agents enable iterative refinement through verbal feedback, they do notreason and plan in ways that are compatible with gradient-based learning fromrewards. This paper introduces a principled framework for reinforcing largelanguage agents by learning a retrospective model, which automatically tunesthe language agent prompts from environment feedback through policy gradient.Specifically, our proposed agent architecture learns from rewards acrossmultiple environments and tasks, for fine-tuning a pre-trained language modelwhich refines the language agent prompt by summarizing the root cause of priorfailed attempts and proposing action plans. Experimental results on varioustasks demonstrate that the language agents improve over time and that ourapproach considerably outperforms baselines that do not properly leveragegradients from the environment. This demonstrates that using policy gradientoptimization to improve language agents, for which we believe our work is oneof the first, seems promising and can be applied to optimize other models inthe agent architecture to enhance agent performances over time.

Quick Read (beta)

loading the full paper ...