Abstract
Reinforcement learning algorithms typically struggle in the absence of adense, well-shaped reward function. Intrinsically motivated exploration methodsaddress this limitation by rewarding agents for visiting novel states ortransitions, but these methods offer limited benefits in large environmentswhere most discovered novelty is irrelevant for downstream tasks. We describe amethod that uses background knowledge from text corpora to shape exploration.This method, called ELLM (Exploring with LLMs) rewards an agent for achievinggoals suggested by a language model prompted with a description of the agent'scurrent state. By leveraging large-scale language model pretraining, ELLMguides agents toward human-meaningful and plausibly useful behaviors withoutrequiring a human in the loop. We evaluate ELLM in the Crafter game environmentand the Housekeep robotic simulator, showing that ELLM-trained agents havebetter coverage of common-sense behaviors during pretraining and usually matchor improve performance on a range of downstream tasks. Code available athttps://github.com/yuqingd/ellm.