Sim-To-Real Optimization Of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play

Abstract

Mobile network that millions of people use every day is one of the mostcomplex systems in real world. Optimization of mobile network to meet explodingcustomer demand and reduce CAPEX/OPEX poses greater challenges than in priorworks. Learning to solve complex problems in real world to benefit everyone andmake the world better has long been ultimate goal of AI. However, it stillremains an unsolved problem for deep reinforcement learning (DRL), givenimperfect information in real world, huge state/action space, lots of dataneeded for training, associated time/cost, multi-agent interactions, potentialnegative impact to real world, etc. To bridge this reality gap, we proposed aDRL framework to direct transfer optimal policy learned from multi-tasks insource domain to unseen similar tasks in target domain without any furthertraining in both domains. First, we distilled temporal-spatial relationshipsbetween cells and mobile users to scalable 3D image-like tensor to bestcharacterize partially observed mobile network. Second, inspired by AlphaGo, weused a novel self-play mechanism to empower DRL agent to gradually improve itsintelligence by competing for best record on multiple tasks. Third, adecentralized DRL method is proposed to coordinate multi-agents to compete andcooperate as a team to maximize global reward and minimize potential negativeimpact. Using 7693 unseen test tasks over 160 unseen simulated mobile networksand 6 field trials over 4 commercial mobile networks in real world, wedemonstrated the capability of our approach to direct transfer the learningfrom one simulator to another simulator, and from simulation to real world.This is the first time that a DRL agent successfully transfers its learningdirectly from simulation to very complex real world problems with incompleteand imperfect information, huge state/action space and multi-agentinteractions.

Quick Read (beta)

loading the full paper ...