Preference-based Multi-Objective Reinforcement Learning

Abstract

Multi-objective reinforcement learning (MORL) is a structured approach foroptimizing tasks with multiple objectives. However, it often relies onpre-defined reward functions, which can be hard to design for balancingconflicting goals and may lead to oversimplification. Preferences can serve asmore flexible and intuitive decision-making guidance, eliminating the need forcomplicated reward design. This paper introduces preference-based MORL(Pb-MORL), which formalizes the integration of preferences into the MORLframework. We theoretically prove that preferences can derive policies acrossthe entire Pareto frontier. To guide policy optimization using preferences, ourmethod constructs a multi-objective reward model that aligns with the givenpreferences. We further provide theoretical proof to show that optimizing thisreward model is equivalent to training the Pareto optimal policy. Extensiveexperiments in benchmark multi-objective tasks, a multi-energy management task,and an autonomous driving task on a multi-line highway show that our methodperforms competitively, surpassing the oracle method, which uses the groundtruth reward function. This highlights its potential for practical applicationsin complex real-world systems.

Quick Read (beta)

loading the full paper ...