Context-aware Constrained Reinforcement Learning Based Energy-Efficient Power Scheduling for Non-stationary XR Data Traffic

Abstract

In XR downlink transmission, energy-efficient power scheduling (EEPS) isessential for conserving power resource while delivering large data packetswithin hard-latency constraints. Traditional constrained reinforcement learning(CRL) algorithms show promise in EEPS but still struggle with non-convexstochastic constraints, non-stationary data traffic, and sparse delayed packetdropout feedback (rewards) in XR. To overcome these challenges, this papermodels the EEPS in XR as a dynamic parameter-constrained Markov decisionprocess (DP-CMDP) with a varying transition function linked to thenon-stationary data traffic and solves it by a proposed context-awareconstrained reinforcement learning (CACRL) algorithm, which consists of acontext inference (CI) module and a CRL module. The CI module trains an encoderand multiple potential networks to characterize the current transition functionand reshape the packet dropout rewards according to the context, transformingthe original DP-CMDP into a general CMDP with immediate dense rewards. The CRLmodule employs a policy network to make EEPS decisions under this CMDP andoptimizes the policy using a constrained stochastic successive convexapproximation (CSSCA) method, which is better suited for non-convex stochasticconstraints. Finally, theoretical analyses provide deep insights into the CADACalgorithm, while extensive simulations demonstrate that it outperforms advancedbaselines in both power conservation and satisfying packet dropout constraints.

Quick Read (beta)

loading the full paper ...