Resilient UAV Trajectory Planning via Few-Shot Meta-Offline Reinforcement Learning

Abstract

Reinforcement learning (RL) has been a promising essence in future 5G-beyondand 6G systems. Its main advantage lies in its robust model-freedecision-making in complex and large-dimension wireless environments. However,most existing RL frameworks rely on online interaction with the environment,which might not be feasible due to safety and cost concerns. Another problemwith online RL is the lack of scalability of the designed algorithm withdynamic or new environments. This work proposes a novel, resilient, few-shotmeta-offline RL algorithm combining offline RL using conservative Q-learning(CQL) and meta-learning using model-agnostic meta-learning (MAML). The proposedalgorithm can train RL models using static offline datasets without any onlineinteraction with the environments. In addition, with the aid of MAML, theproposed model can be scaled up to new unseen environments. We showcase theproposed algorithm for optimizing an unmanned aerial vehicle (UAV) 'strajectory and scheduling policy to minimize the age-of-information (AoI) andtransmission power of limited-power devices. Numerical results show that theproposed few-shot meta-offline RL algorithm converges faster than baselineschemes, such as deep Q-networks and CQL. In addition, it is the only algorithmthat can achieve optimal joint AoI and transmission power using an offlinedataset with few shots of data points and is resilient to network failures dueto unprecedented environmental changes.

Quick Read (beta)

loading the full paper ...