Abstract
In e-commerce markets, on time delivery is of great importance to customersatisfaction. In this paper, we present a Deep Reinforcement Learning (DRL)approach for deciding how and when orders should be batched and picked in awarehouse to minimize the number of tardy orders. In particular, the techniquefacilitates making decisions on whether an order should be picked individually(pick-by-order) or picked in a batch with other orders (pick-by-batch), and ifso with which other orders. We approach the problem by formulating it as asemi-Markov decision process and develop a vector-based state representationthat includes the characteristics of the warehouse system. This allows us tocreate a deep reinforcement learning solution that learns a strategy byinteracting with the environment and solve the problem with a proximal policyoptimization algorithm. We evaluate the performance of the proposed DRLapproach by comparing it with several batching and sequencing heuristics indifferent problem settings. The results show that the DRL approach is able todevelop a strategy that produces consistent, good solutions and performs betterthan the proposed heuristics.