Evolution-Guided Policy Gradient in Reinforcement Learning

Abstract

Deep Reinforcement Learning (DRL) algorithms have been successfully appliedto a range of challenging control tasks. However, these methods typicallysuffer from three core difficulties: temporal credit assignment with sparserewards, lack of effective exploration, and brittle convergence properties thatare extremely sensitive to hyperparameters. Collectively, these challengesseverely limit the applicability of these approaches to real-world problems.Evolutionary Algorithms (EAs), a class of black box optimization techniquesinspired by natural evolution, are well suited to address each of these threechallenges. However, EAs typically suffer from high sample complexity andstruggle to solve problems that require optimization of a large number ofparameters. In this paper, we introduce Evolutionary Reinforcement Learning(ERL), a hybrid algorithm that leverages the population of an EA to providediversified data to train an RL agent, and reinserts the RL agent into the EApopulation periodically to inject gradient information into the EA. ERLinherits EA's ability of temporal credit assignment with a fitness metric,effective exploration with a diverse set of policies, and stability of apopulation-based approach and complements it with off-policy DRL's ability toleverage gradients for higher sample efficiency and faster learning.Experiments in a range of challenging continuous control benchmarks demonstratethat ERL significantly outperforms prior DRL and EA methods.

Quick Read (beta)

loading the full paper ...