Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling

Abstract

The Job Shop Scheduling Problem (JSSP) is a complex combinatorialoptimization problem. While online Reinforcement Learning (RL) has shownpromise by quickly finding acceptable solutions for JSSP, it faces keylimitations: it requires extensive training interactions from scratch leadingto sample inefficiency, cannot leverage existing high-quality solutions, andoften yields suboptimal results compared to traditional methods like ConstraintProgramming (CP). We introduce Offline Reinforcement Learning for Learning toDispatch (Offline-LD), which addresses these limitations by learning frompreviously generated solutions. Our approach is motivated by scenarios wherehistorical scheduling data and expert solutions are available, although ourcurrent evaluation focuses on benchmark problems. Offline-LD adapts twoCQL-based Q-learning methods (mQRDQN and discrete mSAC) for maskable actionspaces, introduces a novel entropy bonus modification for discrete SAC, andexploits reward normalization through preprocessing. Our experimentsdemonstrate that Offline-LD outperforms online RL on both generated andbenchmark instances. Notably, by introducing noise into the expert dataset, weachieve similar or better results than those obtained from the expert dataset,suggesting that a more diverse training set is preferable because it containscounterfactual information.

Quick Read (beta)

loading the full paper ...