An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning

  • 2021-06-01 18:04:19
  • Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng
  


Policy-based reinforcement learning methods suffer from the policy collapseproblem. We find valued-based reinforcement learning methods with{\epsilon}-greedy mechanism are capable of enjoying three characteristics,Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off,which help value-based methods avoid the policy collapse problem. However,there does not exist a parallel mechanism for policy-based methods thatachieves all three characteristics. In this paper, we propose an entropyregularization free mechanism that is designed for policy-based methods, whichachieves Closed-form Diversity, Objective-invariant Exploration and AdaptiveTrade-off. Our experiments show that our mechanism is super sample-efficientfor policy-based methods and boosts a policy-based baseline to a newState-Of-The-Art on Arcade Learning Environment.


