Abstract
We explore a capability of evolution strategies to train an agent with itspolicy based on a transformer architecture in a reinforcement learning setting.We performed experiments using OpenAI's highly parallelizable evolutionstrategy to train Decision Transformer in Humanoid locomotion environment andin the environment of Atari games, testing the ability of this black-boxoptimization technique to train even such relatively large and complicatedmodels (compared to those previously tested in the literature). We alsoproposed a method to aid the training by first pretraining the model beforeusing the OpenAI-ES to train it further, and tested its effectiveness. Theexamined evolution strategy proved to be, in general, capable of achievingstrong results and managed to obtain high-performing agents. Therefore, thepretraining was shown to be unnecessary; yet still, it helped us observe andformulate several further insights.