Abstract
Autonomous exploration in complex multi-agent reinforcement learning (MARL)with sparse rewards critically depends on providing agents with effectiveintrinsic motivation. While artificial curiosity offers a powerfulself-supervised signal, it often confuses environmental stochasticity withmeaningful novelty. Moreover, existing curiosity mechanisms exhibit a uniformnovelty bias, treating all unexpected observations equally. However, peerbehavior novelty, which encode latent task dynamics, are often overlooked,resulting in suboptimal exploration in decentralized, communication-free MARLsettings. To this end, inspired by how human children adaptively calibratetheir own exploratory behaviors via observing peers, we propose a novelapproach to enhance multi-agent exploration. We introduce CERMIC, a principledframework that empowers agents to robustly filter noisy surprise signals andguide exploration by dynamically calibrating their intrinsic curiosity withinferred multi-agent context. Additionally, CERMIC generatestheoretically-grounded intrinsic rewards, encouraging agents to explore statetransitions with high information gain. We evaluate CERMIC on benchmark suitesincluding VMAS, Meltingpot, and SMACv2. Empirical results demonstrate thatexploration with CERMIC significantly outperforms SoTA algorithms insparse-reward environments.