Heuristic Algorithm-based Action Masking Reinforcement Learning (HAAM-RL) with Ensemble Inference Method

Abstract

This paper presents a novel reinforcement learning (RL) approach calledHAAM-RL (Heuristic Algorithm-based Action Masking Reinforcement Learning) foroptimizing the color batching re-sequencing problem in automobile paintingprocesses. The existing heuristic algorithms have limitations in adequatelyreflecting real-world constraints and accurately predicting logisticsperformance. Our methodology incorporates several key techniques including atailored Markov Decision Process (MDP) formulation, reward setting includingPotential-Based Reward Shaping, action masking using heuristic algorithms(HAAM-RL), and an ensemble inference method that combines multiple RL models.The RL agent is trained and evaluated using FlexSim, a commercial 3D simulationsoftware, integrated with our RL MLOps platform BakingSoDA. Experimentalresults across 30 scenarios demonstrate that HAAM-RL with an ensemble inferencemethod achieves a 16.25% performance improvement over the conventionalheuristic algorithm, with stable and consistent results. The proposed approachexhibits superior performance and generalization capability, indicating itseffectiveness in optimizing complex manufacturing processes. The study alsodiscusses future research directions, including alternative staterepresentations, incorporating model-based RL methods, and integratingadditional real-world constraints.

Quick Read (beta)

loading the full paper ...