Implicit Policy for Reinforcement Learning

Abstract

We introduce Implicit Policy, a general class of expressive policies that canflexibly represent complex action distributions in reinforcement learning, withefficient algorithms to compute entropy regularized policy gradients. Weempirically show that, despite its simplicity in implementation, entropyregularization combined with a rich policy class can attain desirableproperties displayed under maximum entropy reinforcement learning framework,such as robustness and multi-modality.

Quick Read (beta)

loading the full paper ...