A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment

Abstract

Empowerment is an information-theoretic method that can be used tointrinsically motivate learning agents. It attempts to maximize an agent'scontrol over the environment by encouraging visiting states with a large numberof reachable next states. Empowered learning has been shown to lead to complexbehaviors, without requiring an explicit reward signal. In this paper, weinvestigate the use of empowerment in the presence of an extrinsic rewardsignal. We hypothesize that empowerment can guide reinforcement learning (RL)agents to find good early behavioral solutions by encouraging highly empoweredstates. We propose a unified Bellman optimality principle for empowered rewardmaximization. Our empowered reward maximization approach generalizes bothBellman's optimality principle as well as recent information-theoreticalextensions to it. We prove uniqueness of the empowered values and showconvergence to the optimal solution. We then apply this idea to developoff-policy actor-critic RL algorithms which we validate in high-dimensionalcontinuous robotics domains (MuJoCo). Our methods demonstrate improved initialand competitive final performance compared to model-free state-of-the-arttechniques.

Quick Read (beta)

loading the full paper ...