Abstract
In this paper, a novel generative adversarial imitation learning(GAIL)-powered policy learning approach is proposed for optimizing beamforming,spectrum allocation, and remote user equipment (RUE) association in NTNs.Traditional reinforcement learning (RL) methods for wireless networkoptimization often rely on manually designed reward functions, which canrequire extensive parameter tuning. To overcome these limitations, we employinverse RL (IRL), specifically leveraging the GAIL framework, to automaticallylearn reward functions without manual design. We augment this framework with anasynchronous federated learning approach, enabling decentralizedmulti-satellite systems to collaboratively derive optimal policies. Theproposed method aims to maximize spectrum efficiency (SE) while meeting minimuminformation rate requirements for RUEs. To address the non-convex, NP-hardnature of this problem, we combine the many-to-one matching theory with amulti-agent asynchronous federated IRL (MA-AFIRL) framework. This allows agentsto learn through asynchronous environmental interactions, improving trainingefficiency and scalability. The expert policy is generated using the Whaleoptimization algorithm (WOA), providing data to train the automatic rewardfunction within GAIL. Simulation results show that the proposed MA-AFIRL methodoutperforms traditional RL approaches, achieving a $14.6\%$ improvement inconvergence and reward value. The novel GAIL-driven policy learning establishesa novel benchmark for 6G NTN optimization.