Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming

  • 2025-03-20 18:33:08
  • Minori Narita, Ryo Kuroiwa, J. Christopher Beck
  • 0

Abstract

Domain-Independent Dynamic Programming (DIDP) is a state-space searchparadigm based on dynamic programming for combinatorial optimization. In itscurrent implementation, DIDP guides the search using user-defined dual bounds.Reinforcement learning (RL) is increasingly being applied to combinatorialoptimization problems and shares several key structures with DP, beingrepresented by the Bellman equation and state-based transition systems. Wepropose using reinforcement learning to obtain a heuristic function to guidethe search in DIDP. We develop two RL-based guidance approaches: value-basedguidance using Deep Q-Networks and policy-based guidance using Proximal PolicyOptimization. Our experiments indicate that RL-based guidance significantlyoutperforms standard DIDP and problem-specific greedy heuristics with the samenumber of node expansions. Further, despite longer node evaluation times, RLguidance achieves better run-time performance than standard DIDP on three offour benchmark domains.

 

Quick Read (beta)

loading the full paper ...