Multiobjective Vehicle Routing Optimization with Time Windows: A Hybrid Approach Using Deep Reinforcement Learning and NSGA-II

Abstract

This paper proposes a weight-aware deep reinforcement learning (WADRL)approach designed to address the multiobjective vehicle routing problem withtime windows (MOVRPTW), aiming to use a single deep reinforcement learning(DRL) model to solve the entire multiobjective optimization problem. TheNon-dominated sorting genetic algorithm-II (NSGA-II) method is then employed tooptimize the outcomes produced by the WADRL, thereby mitigating the limitationsof both approaches. Firstly, we design an MOVRPTW model to balance theminimization of travel cost and the maximization of customer satisfaction.Subsequently, we present a novel DRL framework that incorporates atransformer-based policy network. This network is composed of an encodermodule, a weight embedding module where the weights of the objective functionsare incorporated, and a decoder module. NSGA-II is then utilized to optimizethe solutions generated by WADRL. Finally, extensive experimental resultsdemonstrate that our method outperforms the existing and traditional methods.Due to the numerous constraints in VRPTW, generating initial solutions of theNSGA-II algorithm can be time-consuming. However, using solutions generated bythe WADRL as initial solutions for NSGA-II significantly reduces the timerequired for generating initial solutions. Meanwhile, the NSGA-II algorithm canenhance the quality of solutions generated by WADRL, resulting in solutionswith better scalability. Notably, the weight-aware strategy significantlyreduces the training time of DRL while achieving better results, enabling asingle DRL model to solve the entire multiobjective optimization problem.

Quick Read (beta)

loading the full paper ...