Abstract
[Context] Multi-agent reinforcement learning (MARL) has achieved notablesuccess in environments where agents must learn coordinated behaviors. However,transferring knowledge across agents remains challenging in non-stationaryenvironments with changing goals. [Problem] Traditional knowledge transfermethods in MARL struggle to generalize, and agents often require costlyretraining to adapt. [Approach] This paper introduces a causal knowledgetransfer framework that enables RL agents to learn and share compact causalrepresentations of paths within a non-stationary environment. As theenvironment changes (new obstacles), agents' collisions require adaptiverecovery strategies. We model each collision as a causal interventioninstantiated as a sequence of recovery actions (a macro) whose effectcorresponds to a causal knowledge of how to circumvent the obstacle whileincreasing the chances of achieving the agent's goal (maximizing cumulativereward). This recovery action macro is transferred online from a second agentand is applied in a zero-shot fashion, i.e., without retraining, just byquerying a lookup model with local context information (collisions). [Results]Our findings reveal two key insights: (1) agents with heterogeneous goals wereable to bridge about half of the gap between random exploration and a fullyretrained policy when adapting to new environments, and (2) the impact ofcausal knowledge transfer depends on the interplay between environmentcomplexity and agents' heterogeneous goals.