Multi-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal Preferences

Abstract

Preference-Based reinforcement learning (PBRL) learns directly from thepreferences of human teachers regarding agent behaviors without needingmeticulously designed reward functions. However, existing PBRL methods oftenlearn primarily from explicit preferences, neglecting the possibility thatteachers may choose equal preferences. This neglect may hinder theunderstanding of the agent regarding the task perspective of the teacher,leading to the loss of important information. To address this issue, weintroduce the Equal Preference Learning Task, which optimizes the neuralnetwork by promoting similar reward predictions when the behaviors of twoagents are labeled as equal preferences. Building on this task, we propose anovel PBRL method, Multi-Type Preference Learning (MTPL), which allowssimultaneous learning from equal preferences while leveraging existing methodsfor learning from explicit preferences. To validate our approach, we designexperiments applying MTPL to four existing state-of-the-art baselines acrossten locomotion and robotic manipulation tasks in the DeepMind Control Suite.The experimental results indicate that simultaneous learning from both equaland explicit preferences enables the PBRL method to more comprehensivelyunderstand the feedback from teachers, thereby enhancing feedback efficiency.Project page: \url{https://github.com/FeiCuiLengMMbb/paper_MTPL}

Quick Read (beta)

loading the full paper ...