Meta-Learning for Multi-objective Reinforcement Learning

Abstract

Multi-objective reinforcement learning (MORL) is the generalization ofstandard reinforcement learning (RL) approaches to solve sequential decisionmaking problems that consist of several, possibly conflicting, objectives.Generally, in such formulations, there is no single optimal policy whichoptimizes all the objectives simultaneously, and instead, a number of policieshas to be found each optimizing a preference of the objectives. In other words,the MORL is framed as a meta-learning problem, with the task distribution givenby a distribution over the preferences. We demonstrate that such a formulationresults in a better approximation of the Pareto optimal solutions in terms ofboth the optimality and the computational efficiency. We evaluated our methodon obtaining Pareto optimal policies using a number of continuous controlproblems with high degrees of freedom.

Quick Read (beta)

loading the full paper ...