MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics

Abstract

Transfer reinforcement learning (RL) aims at improving learning efficiency ofan agent by exploiting knowledge from other source agents trained on relevanttasks. However, it remains challenging to transfer knowledge between differentenvironmental dynamics without having access to the source environments. Inthis work, we explore a new challenge in transfer RL, where only a set ofsource policies collected under unknown diverse dynamics is available forlearning a target task efficiently. To address this problem, the proposedapproach, MULTI-source POLicy AggRegation (MULTIPOLAR), comprises two keytechniques. We learn to aggregate the actions provided by the source policiesadaptively to maximize the target task performance. Meanwhile, we learn anauxiliary network that predicts residuals around the aggregated actions, whichensures the target policy's expressiveness even when some of the sourcepolicies perform poorly. We demonstrated the effectiveness of MULTIPOLARthrough an extensive experimental evaluation across six simulated environmentsranging from classic control problems to challenging robotics simulations,under both continuous and discrete action spaces.

Quick Read (beta)

loading the full paper ...