Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning

Abstract

In this paper, we present a new class of Markov decision processes (MDPs),called Tsallis MDPs, with Tsallis entropy maximization, which generalizesexisting maximum entropy reinforcement learning (RL). A Tsallis MDP provides aunified framework for the original RL problem and RL with various types ofentropy, including the well-known standard Shannon-Gibbs (SG) entropy, using anadditional real-valued parameter, called an entropic index. By controlling theentropic index, we can generate various types of entropy, including the SGentropy, and a different entropy results in a different class of the optimalpolicy in Tsallis MDPs. We also provide a full mathematical analysis of TsallisMDPs, including the optimality condition, performance error bounds, andconvergence. Our theoretical result enables us to use any positive entropicindex in RL. To handle complex and large-scale problems, we propose amodel-free actor-critic RL method using Tsallis entropy maximization. Weevaluate the regularization effect of the Tsallis entropy with various valuesof entropic indices and show that the entropic index controls the explorationtendency of the proposed method. For a different type of RL problems, we findthat a different value of the entropic index is desirable. The proposed methodis evaluated using the MuJoCo simulator and achieves the state-of-the-artperformance.

Quick Read (beta)

loading the full paper ...