Abstract
Recent work has shown that, under certain assumptions, zero-shotreinforcement learning (RL) methods can generalise to any unseen task in anenvironment after reward-free pre-training. Access to Markov states is one suchassumption, yet, in many real-world applications, the Markov state is onlypartially observable. Here, we explore how the performance of standardzero-shot RL methods degrades when subjected to partially observability, andshow that, as in single-task RL, memory-based architectures are an effectiveremedy. We evaluate our memory-based zero-shot RL methods in domains where thestates, rewards and a change in dynamics are partially observed, and showimproved performance over memory-free baselines. Our code is open-sourced via:https://enjeeneer.io/projects/bfms-with-memory/.