Evolve To Control: Evolution-based Soft Actor-Critic for Scalable Reinforcement Learning

Abstract

Advances in Reinforcement Learning (RL) have successfully tackled sampleefficiency and overestimation bias. However, these methods often fall short ofscalable performance. On the other hand, genetic methods provide scalabilitybut depict hyperparameter sensitivity to evolutionary operations. We presentthe Evolution-based Soft Actor-Critic (ESAC), a scalable RL algorithm. Ourcontributions are threefold; ESAC (1) abstracts exploration from exploitationby combining Evolution Strategies (ES) with Soft Actor-Critic (SAC), (2)provides dominant skill transfer between offsprings by making use of softwinner selections and genetic crossovers in hindsight and (3) improveshyperparameter sensitivity in evolutions using Automatic Mutation Tuning (AMT).AMT gradually replaces the entropy framework of SAC allowing the population tosucceed at the task while acting as randomly as possible, without making use ofbackpropagation updates. On a range of challenging control tasks consisting ofhigh-dimensional action spaces and sparse rewards, ESAC demonstratesstate-of-the-art performance and sample efficiency equivalent to SAC. ESACdemonstrates scalability comparable to ES on the basis of hardware resourcesand algorithm overhead. A complete implementation of ESAC with notes onreproducibility and videos can be found at the project websitehttps://karush17.github.io/esac-web/.

Quick Read (beta)

loading the full paper ...