Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation

Abstract

In real-world scenarios, the application of reinforcement learning issignificantly challenged by complex non-stationarity. Most existing methodsattempt to model changes in the environment explicitly, often requiringimpractical prior knowledge of environments. In this paper, we propose a newperspective, positing that non-stationarity can propagate and accumulatethrough complex causal relationships during state transitions, therebycompounding its sophistication and affecting policy learning. We believe thatthis challenge can be more effectively addressed by implicitly tracing thecausal origin of non-stationarity. To this end, we introduce the Causal-OriginREPresentation (COREP) algorithm. COREP primarily employs a guided updatingmechanism to learn a stable graph representation for the state, termed ascausal-origin representation. By leveraging this representation, the learnedpolicy exhibits impressive resilience to non-stationarity. We supplement ourapproach with a theoretical analysis grounded in the causal interpretation fornon-stationary reinforcement learning, advocating for the validity of thecausal-origin representation. Experimental results further demonstrate thesuperior performance of COREP over existing methods in tacklingnon-stationarity problems.

Quick Read (beta)

loading the full paper ...