Provably Efficient Reward Transfer in Reinforcement Learning with Discrete Markov Decision Processes

  • 2025-10-22 17:22:42
  • Kevin Vora, Yu Zhang
  • 0

Abstract

In this paper, we propose a new solution to reward adaptation (RA) inreinforcement learning, where the agent adapts to a target reward functionbased on one or more existing source behaviors learned a priori under the samedomain dynamics but different reward functions. While learning the targetbehavior from scratch is possible, it is often inefficient given the availablesource behaviors. Our work introduces a new approach to RA through themanipulation of Q-functions. Assuming the target reward function is a knownfunction of the source reward functions, we compute bounds on the Q-functionand present an iterative process (akin to value iteration) to tighten thesebounds. Such bounds enable action pruning in the target domain before learningeven starts. We refer to this method as "Q-Manipulation" (Q-M). The iterationprocess assumes access to a lite-model, which is easy to provide or learn. Weformally prove that Q-M, under discrete domains, does not affect the optimalityof the returned policy and show that it is provably efficient in terms ofsample complexity in a probabilistic sense. Q-M is evaluated in a variety ofsynthetic and simulation domains to demonstrate its effectiveness,generalizability, and practicality.

 

Quick Read (beta)

loading the full paper ...