Structure-Informed Deep Reinforcement Learning for Inventory Management

Abstract

This paper investigates the application of Deep Reinforcement Learning (DRL)to classical inventory management problems, with a focus on practicalimplementation considerations. We apply a DRL algorithm based on DirectBackpropto several fundamental inventory management scenarios including multi-periodsystems with lost sales (with and without lead times), perishable inventorymanagement, dual sourcing, and joint inventory procurement and removal. The DRLapproach learns policies across products using only historical information thatwould be available in practice, avoiding unrealistic assumptions about demanddistributions or access to distribution parameters. We demonstrate that ourgeneric DRL implementation performs competitively against or outperformsestablished benchmarks and heuristics across these diverse settings, whilerequiring minimal parameter tuning. Through examination of the learnedpolicies, we show that the DRL approach naturally captures many knownstructural properties of optimal policies derived from traditional operationsresearch methods. To further improve policy performance and interpretability,we propose a Structure-Informed Policy Network technique that explicitlyincorporates analytically-derived characteristics of optimal policies into thelearning process. This approach can help interpretability and add robustness tothe policy in out-of-sample performance, as we demonstrate in an example withrealistic demand data. Finally, we provide an illustrative application of DRLin a non-stationary setting. Our work bridges the gap between data-drivenlearning and analytical insights in inventory management while maintainingpractical applicability.

Quick Read (beta)

loading the full paper ...