Collaborative Evolutionary Reinforcement Learning

Abstract

Deep reinforcement learning algorithms have been successfully applied to arange of challenging control tasks. However, these methods typically strugglewith achieving effective exploration and are extremely sensitive to the choiceof hyperparameters. One reason is that most approaches use a noisy version oftheir operating policy to explore - thereby limiting the range of exploration.In this paper, we introduce Collaborative Evolutionary Reinforcement Learning(CERL), a scalable framework that comprises a portfolio of policies thatsimultaneously explore and exploit diverse regions of the solution space. Acollection of learners - typically proven algorithms like TD3 - optimize overvarying time-horizons leading to this diverse portfolio. All learnerscontribute to and use a shared replay buffer to achieve greater sampleefficiency. Computational resources are dynamically distributed to favor thebest learners as a form of online algorithm selection. Neuroevolution bindsthis entire process to generate a single emergent learner that exceeds thecapabilities of any individual learner. Experiments in a range of continuouscontrol benchmarks demonstrate that the emergent learner significantlyoutperforms its composite learners while remaining overall moresample-efficient - notably solving the Mujoco Humanoid benchmark where all ofits composite learners (TD3) fail entirely in isolation.

Quick Read (beta)

loading the full paper ...