Augmenting Unsupervised Reinforcement Learning with Self-Reference

Abstract

Humans possess the ability to draw on past experiences explicitly whenlearning new tasks and applying them accordingly. We believe this capacity forself-referencing is especially advantageous for reinforcement learning agentsin the unsupervised pretrain-then-finetune setting. During pretraining, anagent's past experiences can be explicitly utilized to mitigate thenonstationarity of intrinsic rewards. In the finetuning phase, referencinghistorical trajectories prevents the unlearning of valuable exploratorybehaviors. Motivated by these benefits, we propose the Self-Reference (SR)approach, an add-on module explicitly designed to leverage historicalinformation and enhance agent performance within the pretrain-finetuneparadigm. Our approach achieves state-of-the-art results in terms ofInterquartile Mean (IQM) performance and Optimality Gap reduction on theUnsupervised Reinforcement Learning Benchmark for model-free methods, recordingan 86% IQM and a 16% Optimality Gap. Additionally, it improves currentalgorithms by up to 17% IQM and reduces the Optimality Gap by 31%. Beyondperformance enhancement, the Self-Reference add-on also increases sampleefficiency, a crucial attribute for real-world applications.

Quick Read (beta)

loading the full paper ...