Provable Multi-Objective Reinforcement Learning with Generative Models

Abstract

Multi-objective reinforcement learning (MORL) is an extension of ordinary,single-objective reinforcement learning (RL) that is applicable to manyreal-world tasks where multiple objectives exist without known relative costs.We study the problem of single policy MORL, which learns an optimal policygiven the preference of objectives. Existing methods require strong assumptionssuch as exact knowledge of the multi-objective Markov decision process, and areanalyzed in the limit of infinite data and time. We propose a new algorithmcalled model-based envelop value iteration (EVI), which generalizes theenveloped multi-objective $Q$-learning algorithm in Yang et al., 2019. Ourmethod can learn a near-optimal value function with polynomial samplecomplexity and linear convergence speed. To the best of our knowledge, this isthe first finite-sample analysis of MORL algorithms.

Quick Read (beta)

loading the full paper ...