Revisiting Space Mission Planning: A Reinforcement Learning-Guided Approach for Multi-Debris Rendezvous

Abstract

This research introduces a novel application of a masked Proximal PolicyOptimization (PPO) algorithm from the field of deep reinforcement learning(RL), for determining the most efficient sequence of space debris visitation,utilizing the Lambert solver as per Izzo's adaptation for individualrendezvous. The aim is to optimize the sequence in which all the given debrisshould be visited to get the least total time for rendezvous for the entiremission. A neural network (NN) policy is developed, trained on simulated spacemissions with varying debris fields. After training, the neural networkcalculates approximately optimal paths using Izzo's adaptation of Lambertmaneuvers. Performance is evaluated against standard heuristics in missionplanning. The reinforcement learning approach demonstrates a significantimprovement in planning efficiency by optimizing the sequence for debrisrendezvous, reducing the total mission time by an average of approximately{10.96\%} and {13.66\%} compared to the Genetic and Greedy algorithms,respectively. The model on average identifies the most time-efficient sequencefor debris visitation across various simulated scenarios with the fastestcomputational speed. This approach signifies a step forward in enhancingmission planning strategies for space debris clearance.

Quick Read (beta)

loading the full paper ...