Proximal Distilled Evolutionary Reinforcement Learning

Abstract

Reinforcement Learning (RL) has achieved impressive performance in manycomplex environments due to the integration with Deep Neural Networks (DNNs).At the same time, Genetic Algorithms (GAs), often seen as a competing approachto RL, had limited success in scaling up to the DNNs required to solvechallenging tasks. Contrary to this dichotomic view, in the physical world,evolution and learning are complementary processes that continuously interact.The recently proposed Evolutionary Reinforcement Learning (ERL) framework hasdemonstrated mutual benefits to performance when combining the two methods.However, ERL has not fully addressed the scalability problem of GAs. In thispaper, we show that this problem is rooted in an unfortunate combination of asimple genetic encoding for DNNs and the use of traditionalbiologically-inspired variation operators. When applied to these encodings, thestandard operators are destructive and cause catastrophic forgetting of thetraits the networks acquired. We propose a novel algorithm called ProximalDistilled Evolutionary Reinforcement Learning (PDERL) that is characterised bya hierarchical integration between evolution and learning. The main innovationof PDERL is the use of learning-based variation operators that compensate forthe simplicity of the genetic representation. Unlike traditional operators, ourproposals meet the functional requirements of variation operators when appliedon directly-encoded DNNs. We evaluate PDERL in five robot locomotion settingsfrom the OpenAI gym. Our method outperforms ERL, as well as twostate-of-the-art RL algorithms, PPO and TD3, in all tested environments.

Quick Read (beta)

loading the full paper ...