Abstract
Reinforcement learning (RL) has proved to have a promising role in futureintelligent wireless networks. Online RL has been adopted for radio resourcemanagement (RRM), taking over traditional schemes. However, due to its relianceon online interaction with the environment, its role becomes limited inpractical, real-world problems where online interaction is not feasible. Inaddition, traditional RL stands short in front of the uncertainties and risksin real-world stochastic environments. In this manner, we propose an offlineand distributional RL scheme for the RRM problem, enabling offline trainingusing a static dataset without any interaction with the environment andconsidering the sources of uncertainties using the distributions of the return.Simulation results demonstrate that the proposed scheme outperformsconventional resource management models. In addition, it is the only schemethat surpasses online RL and achieves a $16 \%$ gain over online RL.