SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning

Abstract

Preference-based Reinforcement Learning (PbRL) methods provide a solution toavoid reward engineering by learning reward models based on human preferences.However, poor feedback- and sample- efficiency still remain the problems thathinder the application of PbRL. In this paper, we present a novel efficientquery selection and preference-guided exploration method, called SENIOR, whichcould select the meaningful and easy-to-comparison behavior segment pairs toimprove human feedback-efficiency and accelerate policy learning with thedesigned preference-guided intrinsic rewards. Our key idea is twofold: (1) Wedesigned a Motion-Distinction-based Selection scheme (MDS). It selects segmentpairs with apparent motion and different directions through kernel densityestimation of states, which is more task-related and easy for human preferencelabeling; (2) We proposed a novel preference-guided exploration method (PGE).It encourages the exploration towards the states with high preference and lowvisits and continuously guides the agent achieving the valuable samples. Thesynergy between the two mechanisms could significantly accelerate the progressof reward and policy learning. Our experiments show that SENIOR outperformsother five existing methods in both human feedback-efficiency and policyconvergence speed on six complex robot manipulation tasks from simulation andfour real-worlds.

Quick Read (beta)

loading the full paper ...