Abstract
Reinforcement learning (RL) with diverse offline datasets can have theadvantage of leveraging the relation of multiple tasks and the common skillslearned across those tasks, hence allowing us to deal with real-world complexproblems efficiently in a data-driven way. In offline RL where only offlinedata is used and online interaction with the environment is restricted, it isyet difficult to achieve the optimal policy for multiple tasks, especially whenthe data quality varies for the tasks. In this paper, we present a skill-basedmulti-task RL technique on heterogeneous datasets that are generated bybehavior policies of different quality. To learn the shareable knowledge acrossthose datasets effectively, we employ a task decomposition method for whichcommon skills are jointly learned and used as guidance to reformulate a task inshared and achievable subtasks. In this joint learning, we use Wassersteinauto-encoder (WAE) to represent both skills and tasks on the same latent spaceand use the quality-weighted loss as a regularization term to induce tasks tobe decomposed into subtasks that are more consistent with high-quality skillsthan others. To improve the performance of offline RL agents learned on thelatent space, we also augment datasets with imaginary trajectories relevant tohigh-quality skills for each task. Through experiments, we show that ourmulti-task offline RL approach is robust to the mixed configurations ofdifferent-quality datasets and it outperforms other state-of-the-art algorithmsfor several robotic manipulation tasks and drone navigation tasks.