A Simulation Environment and Reinforcement Learning Method for Waste Reduction

Abstract

In retail (e.g., grocery stores, apparel shops, online retailers), inventorymanagers have to balance short-term risk (no items to sell) with long-term-risk(over ordering leading to product waste). This balancing task is madeespecially hard due to the lack of information about future customer purchases.In this paper, we study the problem of restocking a grocery store's inventorywith perishable items over time, from a distributional point of view. Theobjective is to maximize sales while minimizing waste, with uncertainty aboutthe actual consumption by costumers. This problem is of a high relevance today,given the growing demand for food and the impact of food waste on theenvironment, the economy, and purchasing power. We frame inventory restockingas a new reinforcement learning task that exhibits stochastic behaviorconditioned on the agent's actions, making the environment partiallyobservable. We make two main contributions. First, we introduce a newreinforcement learning environment, RetaiL, based on real grocery store dataand expert knowledge. This environment is highly stochastic, and presents aunique challenge for reinforcement learning practitioners. We show thatuncertainty about the future behavior of the environment is not handled well byclassical supply chain algorithms, and that distributional approaches are agood way to account for the uncertainty. Second, we introduce GTDQN, adistributional reinforcement learning algorithm that learns a generalized TukeyLambda distribution over the reward space. GTDQN provides a strong baseline forour environment. It outperforms other distributional reinforcement learningapproaches in this partially observable setting, in both overall reward andreduction of generated waste.

Quick Read (beta)

loading the full paper ...