Solving the scalarization issues of Advantage-based Reinforcement Learning Algorithms

Abstract

In this research, some of the issues that arise from the scalarization of themulti-objective optimization problem in the Advantage Actor Critic (A2C)reinforcement learning algorithm are investigated. The paper shows how a naivescalarization can lead to gradients overlapping. Furthermore, the possibilitythat the entropy regularization term can be a source of uncontrolled noise isdiscussed. With respect to the above issues, a technique to avoid gradientoverlapping is proposed, while keeping the same loss formulation. Moreover, amethod to avoid the uncontrolled noise, by sampling the actions fromdistributions with a desired minimum entropy, is investigated. A comprehensivepilot experiment is carried out to show how the proposed methods considerablyspeeds up the training. The proposed approach can be applied to anyAdvantage-based Reinforcement Learning algorithm.

Quick Read (beta)

loading the full paper ...