Abstract
It is of significance for an agent to learn a widely applicable andgeneral-purpose policy that can achieve diverse goals including images and textdescriptions. Considering such perceptually-specific goals, the frontier ofdeep reinforcement learning research is to learn a goal-conditioned policywithout hand-crafted rewards. To learn this kind of policy, recent worksusually take as the reward the non-parametric distance to a given goal in anexplicit embedding space. From a different viewpoint, we propose a novelunsupervised learning approach named goal-conditioned policy with intrinsicmotivation (GPIM), which jointly learns both an abstract-level policy and agoal-conditioned policy. The abstract-level policy is conditioned on a latentvariable to optimize a discriminator and discovers diverse states that arefurther rendered into perceptually-specific goals for the goal-conditionedpolicy. The learned discriminator serves as an intrinsic reward function forthe goal-conditioned policy to imitate the trajectory induced by theabstract-level policy. Experiments on various robotic tasks demonstrate theeffectiveness and efficiency of our proposed GPIM method which substantiallyoutperforms prior techniques.