An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning

Abstract

Policy-based reinforcement learning methods suffer from the policy collapseproblem. We find valued-based reinforcement learning methods with{\epsilon}-greedy mechanism are capable of enjoying three characteristics,Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off,which help value-based methods avoid the policy collapse problem. However,there does not exist a parallel mechanism for policy-based methods thatachieves all three characteristics. In this paper, we propose an entropyregularization free mechanism that is designed for policy-based methods, whichachieves Closed-form Diversity, Objective-invariant Exploration and AdaptiveTrade-off. Our experiments show that our mechanism is super sample-efficientfor policy-based methods and boosts a policy-based baseline to a newState-Of-The-Art on Arcade Learning Environment.

Quick Read (beta)

loading the full paper ...