Reinforcement Learning from Diverse Human Preferences

Abstract

The complexity of designing reward functions has been a major obstacle to thewide application of deep reinforcement learning (RL) techniques. Describing anagent's desired behaviors and properties can be difficult, even for experts. Anew paradigm called reinforcement learning from human preferences (orpreference-based RL) has emerged as a promising solution, in which rewardfunctions are learned from human preference labels among behavior trajectories.However, existing methods for preference-based RL are limited by the need foraccurate oracle preference labels. This paper addresses this limitation bydeveloping a method for crowd-sourcing preference labels and learning fromdiverse human preferences. The key idea is to stabilize reward learning throughregularization and correction in a latent space. To ensure temporalconsistency, a strong constraint is imposed on the reward model that forces itslatent space to be close to the prior distribution. Additionally, aconfidence-based reward model ensembling method is designed to generate morestable and reliable predictions. The proposed method is tested on a variety oftasks in DMcontrol and Meta-world and has shown consistent and significantimprovements over existing preference-based RL algorithms when learning fromdiverse feedback, paving the way for real-world applications of RL methods.

Quick Read (beta)

loading the full paper ...