Abstract
A significant bottleneck in applying current reinforcement learningalgorithms to real-world scenarios is the need to reset the environment betweenevery episode. This reset process demands substantial human intervention,making it difficult for the agent to learn continuously and autonomously.Several recent works have introduced autonomous reinforcement learning (ARL)algorithms that generate curricula for jointly training reset and forwardpolicies. While their curricula can reduce the number of required manual resetsby taking into account the agent's learning progress, they rely ontask-specific knowledge, such as predefined initial states or reset rewardfunctions. In this paper, we propose a novel ARL algorithm that can generate acurriculum adaptive to the agent's learning progress without task-specificknowledge. Our curriculum empowers the agent to autonomously reset to diverseand informative initial states. To achieve this, we introduce a successdiscriminator that estimates the success probability from each initial statewhen the agent follows the forward policy. The success discriminator is trainedwith relabeled transitions in a self-supervised manner. Our experimentalresults demonstrate that our ARL algorithm can generate an adaptive curriculumand enable the agent to efficiently bootstrap to solve sparse-reward mazenavigation tasks, outperforming baselines with significantly fewer manualresets.