A note on reinforcement learning with Wasserstein distance regularisation, with applications to multipolicy learning

Abstract

In this note we describe an application of Wasserstein distance toReinforcement Learning. The Wasserstein distance in question is between thedistribution of mappings of trajectories of a policy into some metric space,and some other fixed distribution (which may, for example, come from anotherpolicy). Different policies induce different distributions, so given anunderlying metric, the Wasserstein distance quantifies how different policiesare. This can be used to learn multiple polices which are different in terms ofsuch Wasserstein distances by using a Wasserstein regulariser. Changing thesign of the regularisation parameter, one can learn a policy for which itstrajectory mapping distribution is attracted to a given fixed distribution.

Quick Read (beta)

loading the full paper ...