An Improved Reinforcement Learning Algorithm for Learning to Branch

Abstract

Most combinatorial optimization problems can be formulated as mixed integerlinear programming (MILP), in which branch-and-bound (B\&B) is a general andwidely used method. Recently, learning to branch has become a hot researchtopic in the intersection of machine learning and combinatorial optimization.In this paper, we propose a novel reinforcement learning-based B\&B algorithm.Similar to offline reinforcement learning, we initially train on thedemonstration data to accelerate learning massively. With the improvement ofthe training effect, the agent starts to interact with the environment with itslearned policy gradually. It is critical to improve the performance of thealgorithm by determining the mixing ratio between demonstration andself-generated data. Thus, we propose a prioritized storage mechanism tocontrol this ratio automatically. In order to improve the robustness of thetraining process, a superior network is additionally introduced based on DoubleDQN, which always serves as a Q-network with competitive performance. Weevaluate the performance of the proposed algorithm over three public researchbenchmarks and compare it against strong baselines, including three classicalheuristics and one state-of-the-art imitation learning-based branchingalgorithm. The results show that the proposed algorithm achieves the bestperformance among compared algorithms and possesses the potential to improveB\&B algorithm performance continuously.

Quick Read (beta)

loading the full paper ...