Abstract
Modern RAN operate in highly dynamic and heterogeneous environments, wherehand-tuned, rule-based RRM algorithms often underperform. While RL can surpasssuch heuristics in constrained settings, the diversity of deployments andunpredictable radio conditions introduce major generalization challenges.Data-driven policies frequently overfit to training conditions, degradingperformance in unseen scenarios. To address this, we propose ageneralization-centered RL framework for RAN control that: (i) robustlyreconstructs dynamically varying states from partial and noisy observations,while encoding static and semi-static information, such as radio nodes, cellattributes, and their topology, through graph representations; (ii) appliesdomain randomization to broaden the training distribution; and (iii)distributes data generation across multiple actors while centralizing trainingin a cloud-compatible architecture aligned with O-RAN principles. Althoughgeneralization increases computational and data-management complexity, ourdistributed design mitigates this by scaling data collection and trainingacross diverse network conditions. Applied to downlink link adaptation in five5G benchmarks, our policy improves average throughput and spectral efficiencyby ~10% over an OLLA baseline (10% BLER target) in full-buffer MIMO/mMIMO andby >20% under high mobility. It matches specialized RL in full-buffer trafficand achieves up to 4- and 2-fold gains in eMBB and mixed-traffic benchmarks,respectively. In nine-cell deployments, GAT models offer 30% higher throughputover MLP baselines. These results, combined with our scalable architecture,offer a path toward AI-native 6G RAN using a single, generalizable RL agent.