Koopman-Based Generalization of Deep Reinforcement Learning With Application to Wireless Communications

Abstract

Deep Reinforcement Learning (DRL) is a key machine learning technologydriving progress across various scientific and engineering fields, includingwireless communication. However, its limited interpretability andgeneralizability remain major challenges. In supervised learning,generalizability is commonly evaluated through the generalization error usinginformation-theoretic methods. In DRL, the training data is sequential and notindependent and identically distributed (i.i.d.), rendering traditionalinformation-theoretic methods unsuitable for generalizability analysis. Toaddress this challenge, this paper proposes a novel analytical method forevaluating the generalizability of DRL. Specifically, we first model theevolution of states and actions in trained DRL algorithms as unknown discrete,stochastic, and nonlinear dynamical functions. Then, we employ a data-drivenidentification method, the Koopman operator, to approximate these functions,and propose two interpretable representations. Based on these interpretablerepresentations, we develop a rigorous mathematical approach to evaluate thegeneralizability of DRL algorithms. This approach is formulated using thespectral feature analysis of the Koopman operator, leveraging the H_\inftynorm. Finally, we apply this generalization analysis to compare the softactor-critic method, widely recognized as a robust DRL approach, against theproximal policy optimization algorithm for an unmanned aerial vehicle-assistedmmWave wireless communication scenario.

Quick Read (beta)

loading the full paper ...