Reinforcement Learning-Based Trajectory Design for the Aerial Base Stations

Abstract

In this paper, the trajectory optimization problem for a multi-aerial basestation (ABS) communication network is investigated. The objective is to findthe trajectory of the ABSs so that the sum-rate of the users served by each ABSis maximized. To reach this goal, along with the optimal trajectory design,optimal power and sub-channel allocation is also of great importance to supportthe users with the highest possible data rates. To solve this complicatedproblem, we divide it into two sub-problems: ABS trajectory optimizationsub-problem, and joint power and sub-channel assignment sub-problem. Then,based on the Q-learning method, we develop a distributed algorithm which solvesthese sub-problems efficiently, and does not need significant amount ofinformation exchange between the ABSs and the core network. Simulation resultsshow that although Q-learning is a model-free reinforcement learning technique,it has a remarkable capability to train the ABSs to optimize their trajectoriesbased on the received reward signals, which carry decent information from thetopology of the network.

Quick Read (beta)

loading the full paper ...