Task Transfer by Preference-Based Cost Learning

Abstract

The goal of task transfer in reinforcement learning is to migrate the actionpolicy of an agent to the target task from source task. Given their successeson robotic action planning, current methods mostly rely on two requirements:exactly-relevant expert demonstrations or the explicitly-coded cost function ontarget task, both of which, however, are inconvenient to obtain in practice. Inthis paper, we relax these two strong conditions by developing a novel tasktransfer framework where the expert preference is applied as guidance. Inparticular, we alternate the following two steps: Firstly, Letting expertsapply pre-defined preference rules to select related expert demonstrates forthe target task. Secondly, based on selection result, we learn the target costfunction and trajectory distribution simultaneously via enhanced AdversarialMaxEnt IRL and generate more trajectories by the learned target distributionfor the next preference selection. The theoretical analysis on the distributionlearning and convergence of the proposed algorithm is provided. Extensivesimulations on several benchmarks have been conducted for further verifying theeffectiveness of the proposed method.

Quick Read (beta)

loading the full paper ...