Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning

  • 2022-01-01 19:52:38
  • Ziyang Tang, Yihao Feng, Qiang Liu
  • 0

Abstract

Reinforcement learning (RL) has drawn increasing interests in recent yearsdue to its tremendous success in various applications. However, standard RLalgorithms can only be applied for single reward function, and cannot adapt toan unseen reward function quickly. In this paper, we advocate a generaloperator view of reinforcement learning, which enables us to directlyapproximate the operator that maps from reward function to value function. Thebenefit of learning the operator is that we can incorporate any new rewardfunction as input and attain its corresponding value function in a zero-shotmanner. To approximate this special type of operator, we design a number ofnovel operator neural network architectures based on its theoreticalproperties. Our design of operator networks outperform the existing methods andthe standard design of general purpose operator network, and we demonstrate thebenefit of our operator deep Q-learning framework in several tasks includingreward transferring for offline policy evaluation (OPE) and reward transferringfor offline policy optimization in a range of tasks.

 

Quick Read (beta)

loading the full paper ...