MAMRL: Exploiting Multi-agent Meta Reinforcement Learning in WAN Traffic Engineering

Abstract

Traffic optimization challenges, such as load balancing, flow scheduling, andimproving packet delivery time, are difficult online decision-making problemsin wide area networks (WAN). Complex heuristics are needed for instance to findoptimal paths that improve packet delivery time and minimize interruptionswhich may be caused by link failures or congestion. The recent success ofreinforcement learning (RL) algorithms can provide useful solutions to buildbetter robust systems that learn from experience in model-free settings. In this work, we consider a path optimization problem, specifically forpacket routing, in large complex networks. We develop and evaluate a model-freeapproach, applying multi-agent meta reinforcement learning (MAMRL) that candetermine the next-hop of each packet to get it delivered to its destinationwith minimum time overall. Specifically, we propose to leverage and comparedeep policy optimization RL algorithms for enabling distributed model-freecontrol in communication networks and present a novel meta-learning-basedframework, MAMRL, for enabling quick adaptation to topology changes. Toevaluate the proposed framework, we simulate with various WAN topologies. Ourextensive packet-level simulation results show that compared to classicalshortest path and traditional reinforcement learning approaches, MAMRLsignificantly reduces the average packet delivery time even when network demandincreases; and compared to a non-meta deep policy optimization algorithm, ourresults show the reduction of packet loss in much fewer episodes when linkfailures occur while offering comparable average packet delivery time.

Quick Read (beta)

loading the full paper ...