Abstract
Multi-objective reinforcement learning (MORL) plays a pivotal role inaddressing multi-criteria decision-making problems in the real world. Themulti-policy (MP) based methods are widely used to obtain high-quality Paretofront approximation for the MORL problems. However, traditional MP methods onlyrely on the online reinforcement learning (RL) and adopt the evolutionaryframework with a large policy population. This may lead to sample inefficiencyand/or overwhelmed agent-environment interactions in practice. By forsaking theevolutionary framework, we propose the novel Multi-policy Pareto Front Tracking(MPFT) framework without maintaining any policy population, where both onlineand offline MORL algorithms can be applied. The proposed MPFT frameworkincludes four stages: Stage 1 approximates all the Pareto-vertex policies,whose mapping to the objective space fall on the vertices of the Pareto front.Stage 2 designs the new Pareto tracking mechanism to track the Pareto front,starting from each of the Pareto-vertex policies. Stage 3 identifies the sparseregions in the tracked Pareto front, and introduces a new objective weightadjustment method to fill the sparse regions. Finally, by combining all thepolicies tracked in Stages 2 and 3, Stage 4 approximates the Pareto front.Experiments are conducted on seven different continuous-action robotic controltasks with both online and offline MORL algorithms, and demonstrate thesuperior hypervolume performance of our proposed MPFT approach over thestate-of-the-art benchmarks, with significantly reduced agent-environmentinteractions and hardware requirements.