Action Redundancy in Reinforcement Learning

Abstract

Maximum Entropy (MaxEnt) reinforcement learning is a powerful learningparadigm which seeks to maximize return under entropy regularization. However,action entropy does not necessarily coincide with state entropy, e.g., whenmultiple actions produce the same transition. Instead, we propose to maximizethe transition entropy, i.e., the entropy of next states. We show thattransition entropy can be described by two terms; namely, model-dependenttransition entropy and action redundancy. Particularly, we explore the latterin both deterministic and stochastic settings and develop tractableapproximation methods in a near model-free setup. We construct algorithms tominimize action redundancy and demonstrate their effectiveness on a syntheticenvironment with multiple redundant actions as well as contemporary benchmarksin Atari and Mujoco. Our results suggest that action redundancy is afundamental problem in reinforcement learning.

Quick Read (beta)

loading the full paper ...