Integrating Human Knowledge Through Action Masking in Reinforcement Learning for Operations Research

Abstract

Reinforcement learning (RL) provides a powerful method to address problems inoperations research. However, its real-world application often fails due to alack of user acceptance and trust. A possible remedy is to provide managerswith the possibility of altering the RL policy by incorporating human expertknowledge. In this study, we analyze the benefits and caveats of includinghuman knowledge via action masking. While action masking has so far been usedto exclude invalid actions, its ability to integrate human expertise remainsunderexplored. Human knowledge is often encapsulated in heuristics, whichsuggest reasonable, near-optimal actions in certain situations. Enforcing suchactions should hence increase trust among the human workforce to rely on themodel's decisions. Yet, a strict enforcement of heuristic actions may alsorestrict the policy from exploring superior actions, thereby leading to overalllower performance. We analyze the effects of action masking based on threeproblems with different characteristics, namely, paint shop scheduling, peakload management, and inventory management. Our findings demonstrate thatincorporating human knowledge through action masking can achieve substantialimprovements over policies trained without action masking. In addition, we findthat action masking is crucial for learning effective policies in constrainedaction spaces, where certain actions can only be performed a limited number oftimes. Finally, we highlight the potential for suboptimal outcomes when actionmasks are overly restrictive.

Quick Read (beta)

loading the full paper ...