Abstract
Offline meta-reinforcement learning (OMRL) utilizes pre-collected offlinedatasets to enhance the agent's generalization ability on unseen tasks.However, the context shift problem arises due to the distribution discrepancybetween the contexts used for training (from the behavior policy) and testing(from the exploration policy). The context shift problem leads to incorrecttask inference and further deteriorates the generalization ability of themeta-policy. Existing OMRL methods either overlook this problem or attempt tomitigate it with additional information. In this paper, we propose a novelapproach called Context Shift Reduction for OMRL (CSRO) to address the contextshift problem with only offline datasets. The key insight of CSRO is tominimize the influence of policy in context during both the meta-training andmeta-test phases. During meta-training, we design a max-min mutual informationrepresentation learning mechanism to diminish the impact of the behavior policyon task representation. In the meta-test phase, we introduce the non-priorcontext collection strategy to reduce the effect of the exploration policy.Experimental results demonstrate that CSRO significantly reduces the contextshift and improves the generalization ability, surpassing previous methodsacross various challenging domains.