S-EPOA: Overcoming the Indivisibility of Annotations with Skill-Driven Preference-Based Reinforcement Learning

Abstract

Preference-based reinforcement learning (PbRL) stands out by utilizing humanpreferences as a direct reward signal, eliminating the need for intricatereward engineering. However, despite its potential, traditional PbRL methodsare often constrained by the indivisibility of annotations, which impedes thelearning process. In this paper, we introduce a groundbreaking approach,Skill-Enhanced Preference Optimization Algorithm~(S-EPOA), which addresses theannotation indivisibility issue by integrating skill mechanisms into thepreference learning framework. Specifically, we first conduct the unsupervisedpretraining to learn useful skills. Then, we propose a novel query selectionmechanism to balance the information gain and discriminability over the learnedskill space. Experimental results on a range of tasks, including roboticmanipulation and locomotion, demonstrate that S-EPOA significantly outperformsconventional PbRL methods in terms of both robustness and learning efficiency.The results highlight the effectiveness of skill-driven learning in overcomingthe challenges posed by annotation indivisibility.

Quick Read (beta)

loading the full paper ...