We present an end-to-end framework for solving Vehicle Routing Problem (VRP)using deep reinforcement learning. In this approach, we train a single modelthat finds near-optimal solutions for problem instances sampled from a givendistribution, only by observing the reward signals and following feasibilityrules. Our model represents a parameterized stochastic policy, and by applyinga policy gradient algorithm to optimize its parameters, the trained modelproduces the solution as a sequence of consecutive actions in real time,without the need to re-train for every new problem instance. Our method isfaster in both training and inference than a recent method that solves theTraveling Salesman Problem (TSP), with nearly identical solution quality. Onthe more general VRP, our approach outperforms classical heuristics onmedium-sized instances in both solution quality and computation time (aftertraining). Our proposed framework can be applied to variants of the VRP such asthe stochastic VRP, and has the potential to be applied more generally tocombinatorial optimization problems.