Similarity as Reward Alignment: Robust and Versatile Preference-based Reinforcement Learning

Abstract

Preference-based Reinforcement Learning (PbRL) entails a variety ofapproaches for aligning models with human intent to alleviate the burden ofreward engineering. However, most previous PbRL work has not investigated therobustness to labeler errors, inevitable with labelers who are non-experts oroperate under time constraints. Additionally, PbRL algorithms often target veryspecific settings (e.g. pairwise ranked preferences or purely offlinelearning). We introduce Similarity as Reward Alignment (SARA), a simplecontrastive framework that is both resilient to noisy labels and adaptable todiverse feedback formats and training paradigms. SARA learns a latentrepresentation of preferred samples and computes rewards as similarities to thelearned latent. We demonstrate strong performance compared to baselines oncontinuous control offline RL benchmarks. We further demonstrate SARA'sversatility in applications such as trajectory filtering for downstream tasks,cross-task preference transfer, and reward shaping in online learning.

Quick Read (beta)

loading the full paper ...