Aviary: training language agents on challenging scientific tasks

Abstract

Solving complex real-world tasks requires cycles of actions and observations.This is particularly true in science, where tasks require many cycles ofanalysis, tool use, and experimentation. Language agents are promising forautomating intellectual tasks in science because they can interact with toolsvia natural language or code. Yet their flexibility creates conceptual andpractical challenges for software implementations, since agents may comprisenon-standard components such as internal reasoning, planning, tool usage, aswell as the inherent stochasticity of temperature-sampled language models.Here, we introduce Aviary, an extensible gymnasium for language agents. Weformalize agents as policies solving language-grounded partially observableMarkov decision processes, which we term language decision processes. We thenimplement five environments, including three challenging scientificenvironments: (1) manipulating DNA constructs for molecular cloning, (2)answering research questions by accessing scientific literature, and (3)engineering protein stability. These environments were selected for their focuson multi-step reasoning and their relevance to contemporary biology research.Finally, with online training and scaling inference-time compute, we show thatlanguage agents backed by open-source, non-frontier LLMs can match and exceedboth frontier LLM agents and human experts on multiple tasks at up to 100xlower inference cost.

Quick Read (beta)

loading the full paper ...