Abstract
In real-world scenarios, the application of reinforcement learning issignificantly challenged by complex non-stationarity. Most existing methodsattempt to model changes in the environment explicitly, often requiringimpractical prior knowledge of environments. In this paper, we propose a newperspective, positing that non-stationarity can propagate and accumulatethrough complex causal relationships during state transitions, therebycompounding its sophistication and affecting policy learning. We believe thatthis challenge can be more effectively addressed by implicitly tracing thecausal origin of non-stationarity. To this end, we introduce the Causal-OriginREPresentation (COREP) algorithm. COREP primarily employs a guided updatingmechanism to learn a stable graph representation for the state, termed ascausal-origin representation. By leveraging this representation, the learnedpolicy exhibits impressive resilience to non-stationarity. We supplement ourapproach with a theoretical analysis grounded in the causal interpretation fornon-stationary reinforcement learning, advocating for the validity of thecausal-origin representation. Experimental results further demonstrate thesuperior performance of COREP over existing methods in tacklingnon-stationarity problems.