Abstract
Deep reinforcement learning (DRL)-based frameworks, featuringTransformer-style policy networks, have demonstrated their efficacy acrossvarious vehicle routing problem (VRP) variants. However, the application ofthese methods to the multi-trip time-dependent vehicle routing problem(MTTDVRP) with maximum working hours constraints -- a pivotal element of urbanlogistics -- remains largely unexplored. This paper introduces a DRL-basedmethod called the Simultaneous Encoder and Dual Decoder Attention Model(SED2AM), tailored for the MTTDVRP with maximum working hours constraints. Theproposed method introduces a temporal locality inductive bias to the encodingmodule of the policy networks, enabling it to effectively account for thetime-dependency in travel distance or time. The decoding module of SED2AMincludes a vehicle selection decoder that selects a vehicle from the fleet,effectively associating trips with vehicles for functional multi-trip routing.Additionally, this decoding module is equipped with a trip construction decoderleveraged for constructing trips for the vehicles. This policy model isequipped with two classes of state representations, fleet state and routingstate, providing the information needed for effective route construction in thepresence of maximum working hours constraints. Experimental results usingreal-world datasets from two major Canadian cities not only show that SED2AMoutperforms the current state-of-the-art DRL-based and metaheuristic-basedbaselines but also demonstrate its generalizability to solve larger-scaleproblems.