A Deep Q-Network for the Beer Game: A Reinforcement Learning algorithm to Solve Inventory Optimization Problems

  • 2018-03-08 15:09:56
  • Afshin Oroojlooyjadid, MohammadReza Nazari, Lawrence Snyder, Martin Takáč
  • 0

Abstract

The beer game is a widely used in-class game that is played in supply chainmanagement classes to demonstrate the bullwhip effect. The game is adecentralized, multi-agent, cooperative problem that can be modeled as a serialsupply chain network in which agents cooperatively attempt to minimize thetotal cost of the network even though each agent can only observe its own localinformation. Each agent chooses order quantities to replenish its stock. Undersome conditions, a base-stock replenishment policy is known to be optimal.However, in a decentralized supply chain in which some agents (stages) may actirrationally (as they do in the beer game), there is no known optimal policyfor an agent wishing to act optimally. We propose a machine learning algorithm, based on deep Q-networks, tooptimize the replenishment decisions at a given stage. When playing alongsideagents who follow a base-stock policy, our algorithm obtains near-optimal orderquantities. It performs much better than a base-stock policy when the otheragents use a more realistic model of human ordering behavior. Unlike most otheralgorithms in the literature, our algorithm does not have any limits on thebeer game parameter values. Like any deep learning algorithm, training thealgorithm can be computationally intensive, but this can be performed ahead oftime; the algorithm executes in real time when the game is played. Moreover, wepropose a transfer learning approach so that the training performed for oneagent and one set of cost coefficients can be adapted quickly for other agentsand costs. Our algorithm can be extended to other decentralized multi-agentcooperative games with partially observed information, which is a common typeof situation in real-world supply chain problems.

 

Quick Read (beta)

loading the full paper ...