Meta-Learning for Multi-objective Reinforcement Learning

Abstract

Multi-objective reinforcement learning (MORL) is the generalization ofstandard reinforcement learning (RL) approaches to solve sequential decisionmaking problems that consist of several, possibly conflicting, objectives.Generally, in such formulations, there is no single optimal policy whichoptimizes all the objectives simultaneously, and instead, a number of policieshas to be found, each optimizing a preference of the objectives. In this paper,we introduce a novel MORL approach by training a meta-policy, a policysimultaneously trained with multiple tasks sampled from a task distribution,for a number of randomly sampled Markov decision processes (MDPs). In otherwords, the MORL is framed as a meta-learning problem, with the taskdistribution given by a distribution over the preferences. We demonstrate thatsuch a formulation results in a better approximation of the Pareto optimalsolutions, in terms of both the optimality and the computational efficiency. Weevaluated our method on obtaining Pareto optimal policies using a number ofcontinuous control problems with high degrees of freedom.

Quick Read (beta)

loading the full paper ...