Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning

Abstract

Continuous control tasks in reinforcement learning are important because theyprovide an important framework for learning in high-dimensional state spaceswith deceptive rewards, where the agent can easily become trapped intosuboptimal solutions. One way to avoid local optima is to use a population ofagents to ensure coverage of the policy space, yet learning a population withthe "best" coverage is still an open problem. In this work, we present a novelapproach to population-based RL in continuous control that leverages propertiesof normalizing flows to perform attractive and repulsive operations betweencurrent members of the population and previously observed policies. Empiricalresults on the MuJoCo suite demonstrate a high performance gain for ouralgorithm compared to prior work, including Soft-Actor Critic (SAC).

Quick Read (beta)

loading the full paper ...