Advances in Preference-based Reinforcement Learning: A Review

Abstract

Reinforcement Learning (RL) algorithms suffer from the dependency onaccurately engineered reward functions to properly guide the learning agents todo the required tasks. Preference-based reinforcement learning (PbRL) addressesthat by utilizing human preferences as feedback from the experts instead ofnumeric rewards. Due to its promising advantage over traditional RL, PbRL hasgained more focus in recent years with many significant advances. In thissurvey, we present a unified PbRL framework to include the newly emergingapproaches that improve the scalability and efficiency of PbRL. In addition, wegive a detailed overview of the theoretical guarantees and benchmarking workdone in the field, while presenting its recent applications in complexreal-world tasks. Lastly, we go over the limitations of the current approachesand the proposed future research directions.

Quick Read (beta)

loading the full paper ...