Abstract
Reinforcement learning (RL) algorithms typically start tabula rasa, withoutany prior knowledge of the environment, and without any prior skills. Thishowever often leads to low sample efficiency, requiring a large amount ofinteraction with the environment. This is especially true in a lifelonglearning setting, in which the agent needs to continually extend itscapabilities. In this paper, we examine how a pre-trained task-independentlanguage model can make a goal-conditional RL agent more sample efficient. Wedo this by facilitating transfer learning between different related tasks. Weexperimentally demonstrate our approach on a set of object navigation tasks.
Quick Read (beta)
loading the full paper ...