Abstract
The Dreamer agent provides various benefits of Model-Based ReinforcementLearning (MBRL) such as sample efficiency, reusable knowledge, and safeplanning. However, its world model and policy networks inherit the limitationsof recurrent neural networks and thus an important question is how an MBRLframework can benefit from the recent advances of transformers and what thechallenges are in doing so. In this paper, we propose a transformer-based MBRLagent, called TransDreamer. We first introduce the Transformer State-SpaceModel, a world model that leverages a transformer for dynamics predictions. Wethen share this world model with a transformer-based policy network and obtainstability in training a transformer-based RL agent. In experiments, we applythe proposed model to 2D visual RL and 3D first-person visual RL tasks bothrequiring long-range memory access for memory-based reasoning. We show that theproposed model outperforms Dreamer in these complex tasks.