Abstract
In this paper, the trajectory optimization problem for a multi-aerial basestation (ABS) communication network is investigated. The objective is to findthe trajectory of the ABSs so that the sum-rate of the users served by each ABSis maximized. To reach this goal, along with the optimal trajectory design,optimal power and sub-channel allocation is also of great importance to supportthe users with the highest possible data rates. To solve this complicatedproblem, we divide it into two sub-problems: ABS trajectory optimizationsub-problem, and joint power and sub-channel assignment sub-problem. Then,based on the Q-learning method, we develop a distributed algorithm which solvesthese sub-problems efficiently, and does not need significant amount ofinformation exchange between the ABSs and the core network. Simulation resultsshow that although Q-learning is a model-free reinforcement learning technique,it has a remarkable capability to train the ABSs to optimize their trajectoriesbased on the received reward signals, which carry decent information from thetopology of the network.