Exploiting Action Impact Regularity and Partially Known Models for Offline Reinforcement Learning

Abstract

Offline reinforcement learning-learning a policy from a batch of data-isknown to be hard: without making strong assumptions, it is easy to constructcounterexamples such that existing algorithms fail. In this work, we insteadconsider a property of certain real world problems where offline reinforcementlearning should be effective: those where actions only have limited impact fora part of the state. We formalize and introduce this Action Impact Regularity(AIR) property. We further propose an algorithm that assumes and exploits theAIR property, and bound the suboptimality of the output policy when the MDPsatisfies AIR. Finally, we demonstrate that our algorithm outperforms existingoffline reinforcement learning algorithms across different data collectionpolicies in two simulated environments where the regularity holds.

Quick Read (beta)

loading the full paper ...