Abstract
An important challenge in multi-objective reinforcement learning is obtaininga Pareto front of policies to attain optimal performance under differentpreferences. We introduce Iterated Pareto Referent Optimisation (IPRO), whichdecomposes finding the Pareto front into a sequence of constrainedsingle-objective problems. This enables us to guarantee convergence whileproviding an upper bound on the distance to undiscovered Pareto optimalsolutions at each step. We evaluate IPRO using utility-based metrics and itshypervolume and find that it matches or outperforms methods that requireadditional assumptions. By leveraging problem-specific single-objectivesolvers, our approach also holds promise for applications beyondmulti-objective reinforcement learning, such as planning and pathfinding.