Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI Gym

Abstract

The formidable capacity for zero- or few-shot decision-making in languageagents encourages us to pose a compelling question: Can language agents bealternatives to PPO agents in traditional sequential decision-making tasks? Toinvestigate this, we first take environments collected in OpenAI Gym as ourtestbeds and ground them to textual environments that construct the TextGymsimulator. This allows for straightforward and efficient comparisons betweenPPO agents and language agents, given the widespread adoption of OpenAI Gym. Toensure a fair and effective benchmarking, we introduce $5$ levels of scenariofor accurate domain-knowledge controlling and a unified RL-inspired frameworkfor language agents. Additionally, we propose an innovativeexplore-exploit-guided language (EXE) agent to solve tasks within TextGym.Through numerical experiments and ablation studies, we extract valuableinsights into the decision-making capabilities of language agents and make apreliminary evaluation of their potential to be alternatives to PPO inclassical sequential decision-making problems. This paper sheds light on theperformance of language agents and paves the way for future research in thisexciting domain. Our code is publicly availableat~\url{https://github.com/mail-ecnu/Text-Gym-Agents}.

Quick Read (beta)

loading the full paper ...