Multi-Task Reward Learning from Human Ratings

Abstract

Reinforcement learning from human feedback (RLHF) has become a key factor inaligning model behavior with users' goals. However, while humans integratemultiple strategies when making decisions, current RLHF approaches oftensimplify this process by modeling human reasoning through isolated tasks suchas classification or regression. In this paper, we propose a novelreinforcement learning (RL) method that mimics human decision-making by jointlyconsidering multiple tasks. Specifically, we leverage human ratings inreward-free environments to infer a reward function, introducing learnableweights that balance the contributions of both classification and regressionmodels. This design captures the inherent uncertainty in human decision-makingand allows the model to adaptively emphasize different strategies. We conductseveral experiments using synthetic human ratings to validate the effectivenessof the proposed approach. Results show that our method consistently outperformsexisting rating-based RL methods, and in some cases, even surpasses traditionalRL approaches.

Quick Read (beta)

loading the full paper ...