Abstract
Unsupervised reinforcement learning (URL) aims to learn general skills forunseen downstream tasks. Mutual Information Skill Learning (MISL) addresses URLby maximizing the mutual information between states and skills but lackssufficient theoretical analysis, e.g., how well its learned skills caninitialize a downstream task's policy. Our new theoretical analysis in thispaper shows that the diversity and separability of learned skills arefundamentally critical to downstream task adaptation but MISL does notnecessarily guarantee these properties. To complement MISL, we propose a noveldisentanglement metric LSEPIN. Moreover, we build an information-geometricconnection between LSEPIN and downstream task adaptation cost. For bettergeometric properties, we investigate a new strategy that replaces the KLdivergence in information geometry with Wasserstein distance. We extend thegeometric analysis to it, which leads to a novel skill-learning objective WSEP.It is theoretically justified to be helpful to downstream task adaptation andit is capable of discovering more initial policies for downstream tasks thanMISL. We finally propose another Wasserstein distance-based algorithm PWSEPthat can theoretically discover all optimal initial policies.