Sim-to-Real Optimization of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play

Abstract

Mobile network that millions of people use every day is one of the mostcomplex systems in real world. Optimization of mobile network to meet explodingcustomer demand and reduce CAPEX/OPEX poses greater challenges than in priorworks. Actually, learning to solve complex problems in real world to benefiteveryone and make the world better has long been ultimate goal of AI. However,application of deep reinforcement learning (DRL) to complex problems in realworld still remains unsolved, due to imperfect information, data scarcity andcomplex rules in real world, potential negative impact to real world, etc. Tobridge this reality gap, we propose a sim-to-real framework to direct transferlearning from simulation to real world without any training in real world.First, we distill temporal-spatial relationships between cells and mobile usersto scalable 3D image-like tensor to best characterize partially observed mobilenetwork. Second, inspired by AlphaGo, we introduce a novel self-play mechanismto empower DRL agents to gradually improve intelligence by competing for bestrecord on multiple tasks, just like athletes compete for world record indecathlon. Third, a decentralized DRL method is proposed to coordinatemulti-agents to compete and cooperate as a team to maximize global reward andminimize potential negative impact. Using 7693 unseen test tasks over 160unseen mobile networks in another simulator as well as 6 field trials on 4commercial mobile networks in real world, we demonstrate the capability of thissim-to-real framework to direct transfer the learning not only from onesimulator to another simulator, but also from simulation to real world. This isthe first time that a DRL agent successfully transfers its learning directlyfrom simulation to very complex real world problems with imperfect information,complex rules, huge state/action space, and multi-agent interactions.

Quick Read (beta)

loading the full paper ...