Abstract
In XR downlink transmission, energy-efficient power scheduling (EEPS) isessential for conserving power resource while delivering large data packetswithin hard-latency constraints. Traditional constrained reinforcement learning(CRL) algorithms show promise in EEPS but still struggle with non-convexstochastic constraints, non-stationary data traffic, and sparse delayed packetdropout feedback (rewards) in XR. To overcome these challenges, this papermodels the EEPS in XR as a dynamic parameter-constrained Markov decisionprocess (DP-CMDP) with a varying transition function linked to thenon-stationary data traffic and solves it by a proposed context-awareconstrained reinforcement learning (CACRL) algorithm, which consists of acontext inference (CI) module and a CRL module. The CI module trains an encoderand multiple potential networks to characterize the current transition functionand reshape the packet dropout rewards according to the context, transformingthe original DP-CMDP into a general CMDP with immediate dense rewards. The CRLmodule employs a policy network to make EEPS decisions under this CMDP andoptimizes the policy using a constrained stochastic successive convexapproximation (CSSCA) method, which is better suited for non-convex stochasticconstraints. Finally, theoretical analyses provide deep insights into the CADACalgorithm, while extensive simulations demonstrate that it outperforms advancedbaselines in both power conservation and satisfying packet dropout constraints.