Dynamic Preference Multi-Objective Reinforcement Learning for Internet Network Management

Abstract

An internet network service provider manages its network with multipleobjectives, such as high quality of service (QoS) and minimum computingresource usage. To achieve these objectives, a reinforcement learning-based(RL) algorithm has been proposed to train its network management agent.Usually, their algorithms optimize their agents with respect to a single staticreward formulation consisting of multiple objectives with fixed importancefactors, which we call preferences. However, in practice, the preference couldvary according to network status, external concerns and so on. For example,when a server shuts down and it can cause other servers' traffic overloadsleading to additional shutdowns, it is plausible to reduce the preference ofQoS while increasing the preference of minimum computing resource usages. Inthis paper, we propose new RL-based network management agents that can selectactions based on both states and preferences. With our proposed approach, weexpect a single agent to generalize on various states and preferences.Furthermore, we propose a numerical method that can estimate the distributionof preference that is advantageous for unbiased training. Our experimentresults show that the RL agents trained based on our proposed approachsignificantly generalize better with various preferences than the previous RLapproaches, which assume static preference during training. Moreover, wedemonstrate several analyses that show the advantages of our numericalestimation method.

Quick Read (beta)

loading the full paper ...