Maximum Total Correlation Reinforcement Learning

Abstract

Simplicity is a powerful inductive bias. In reinforcement learning,regularization is used for simpler policies, data augmentation for simplerrepresentations, and sparse reward functions for simpler objectives, all that,with the underlying motivation to increase generalizability and robustness byfocusing on the essentials. Supplementary to these techniques, we investigatehow to promote simple behavior throughout the episode. To that end, weintroduce a modification of the reinforcement learning problem thatadditionally maximizes the total correlation within the induced trajectories.We propose a practical algorithm that optimizes all models, including policyand state representation, based on a lower-bound approximation. In simulatedrobot environments, our method naturally generates policies that induceperiodic and compressible trajectories, and that exhibit superior robustness tonoise and changes in dynamics compared to baseline methods, while alsoimproving performance in the original tasks.

Quick Read (beta)

loading the full paper ...