Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents

Abstract

Evolution strategies (ES) are a family of black-box optimization algorithmsable to train deep neural networks roughly as well as Q-learning and policygradient methods on challenging deep reinforcement learning (RL) problems, butare much faster (e.g. hours vs. days) because they parallelize better. However,many RL problems require directed exploration because they have rewardfunctions that are sparse or deceptive (i.e. contain local optima), and it isunknown how to encourage such exploration with ES. Here we show that algorithmsthat have been invented to promote directed exploration in small-scale evolvedneural networks via populations of exploring agents, specifically noveltysearch (NS) and quality diversity (QD) algorithms, can be hybridized with ES toimprove its performance on sparse or deceptive deep RL tasks, while retainingscalability. Our experiments confirm that the resultant new algorithms, NS-ESand two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encountered by ESto achieve higher performance on Atari and simulated robots learning to walkaround a deceptive trap. This paper thus introduces a family of fast, scalablealgorithms for reinforcement learning that are capable of directed exploration.It also adds this new family of exploration algorithms to the RL toolbox andraises the interesting possibility that analogous algorithms with multiplesimultaneous paths of exploration might also combine well with existing RLalgorithms outside ES.

Quick Read (beta)

loading the full paper ...