Abstract
Transfer reinforcement learning aims to derive a near-optimal policy for atarget environment with limited data by leveraging abundant data from relatedsource domains. However, it faces two key challenges: the lack of performanceguarantees for the transferred policy, which can lead to undesired actions, andthe risk of negative transfer when multiple source domains are involved. Wepropose a novel framework based on the pessimism principle, which constructsand optimizes a conservative estimation of the target domain's performance. Ourframework effectively addresses the two challenges by providing an optimizedlower bound on target performance, ensuring safe and reliable decisions, and byexhibiting monotonic improvement with respect to the quality of the sourcedomains, thereby avoiding negative transfer. We construct two types ofconservative estimations, rigorously characterize their effectiveness, anddevelop efficient distributed algorithms with convergence guarantees. Ourframework provides a theoretically sound and practically robust solution fortransfer learning in reinforcement learning.