Abstract
The exponential growth of Low Earth Orbit (LEO) satellites has revolutionisedEarth Observation (EO) missions, addressing challenges in climate monitoring,disaster management, and more. However, autonomous coordination inmulti-satellite systems remains a fundamental challenge. Traditionaloptimisation approaches struggle to handle the real-time decision-makingdemands of dynamic EO missions, necessitating the use of Reinforcement Learning(RL) and Multi-Agent Reinforcement Learning (MARL). In this paper, weinvestigate RL-based autonomous EO mission planning by modellingsingle-satellite operations and extending to multi-satellite constellationsusing MARL frameworks. We address key challenges, including energy and datastorage limitations, uncertainties in satellite observations, and thecomplexities of decentralised coordination under partial observability. Byleveraging a near-realistic satellite simulation environment, we evaluate thetraining stability and performance of state-of-the-art MARL algorithms,including PPO, IPPO, MAPPO, and HAPPO. Our results demonstrate that MARL caneffectively balance imaging and resource management while addressingnon-stationarity and reward interdependency in multi-satellite coordination.The insights gained from this study provide a foundation for autonomoussatellite operations, offering practical guidelines for improving policylearning in decentralised EO missions.