Solving Deep Reinforcement Learning Tasks with Evolution Strategies and Linear Policy Networks

Abstract

Although deep reinforcement learning methods can learn effective policies forchallenging problems such as Atari games and robotics tasks, algorithms arecomplex, and training times are often long. This study investigates howEvolution Strategies perform compared to gradient-based deep reinforcementlearning methods. We use Evolution Strategies to optimize the weights of aneural network via neuroevolution, performing direct policy search. Webenchmark both deep policy networks and networks consisting of a single linearlayer from observations to actions for three gradient-based methods, such asProximal Policy Optimization. These methods are evaluated against threeclassical Evolution Strategies and Augmented Random Search, which all uselinear policy networks. Our results reveal that Evolution Strategies can findeffective linear policies for many reinforcement learning benchmark tasks,unlike deep reinforcement learning methods that can only find successfulpolicies using much larger networks, suggesting that current benchmarks areeasier to solve than previously assumed. Interestingly, Evolution Strategiesalso achieve results comparable to gradient-based deep reinforcement learningalgorithms for higher-complexity tasks. Furthermore, we find that by directlyaccessing the memory state of the game, Evolution Strategies can findsuccessful policies in Atari that outperform the policies found by DeepQ-Learning. Evolution Strategies also outperform Augmented Random Search inmost benchmarks, demonstrating superior sample efficiency and robustness intraining linear policy networks.

Quick Read (beta)

loading the full paper ...