Equivalence Between Wasserstein and Value-Aware Loss for Model-based Reinforcement Learning

Abstract

Learning a generative model is a key component of model-based reinforcementlearning. Though learning a good model in the tabular setting is a simple task,learning a useful model in the approximate setting is challenging. In thiscontext, an important question is the loss function used for model learning asvarying the loss function can have a remarkable impact on effectiveness ofplanning. Recently Farahmand et al. (2017) proposed a value-aware modellearning (VAML) objective that captures the structure of value function duringmodel learning. Using tools from Asadi et al. (2018), we show that minimizingthe VAML objective is in fact equivalent to minimizing the Wasserstein metric.This equivalence improves our understanding of value-aware models, and alsocreates a theoretical foundation for applications of Wasserstein in model-basedreinforcement~learning.

Quick Read (beta)

loading the full paper ...