Abstract
In practice, it is quite common to face combinatorial optimization problemswhich contain uncertainty along with non-determinism and dynamicity. Thesethree properties call for appropriate algorithms; reinforcement learning (RL)is dealing with them in a very natural way. Today, despite some efforts, mostreal-life combinatorial optimization problems remain out of the reach ofreinforcement learning algorithms. In this paper, we propose a reinforcement learning approach to solve arealistic scheduling problem, and apply it to an algorithm commonly executed inthe high performance computing community, the Cholesky factorization. On thecontrary to static scheduling, where tasks are assigned to processors in apredetermined ordering before the beginning of the parallel execution, ourmethod is dynamic: task allocations and their execution ordering are decided atruntime, based on the system state and unexpected events, which allows muchmore flexibility. To do so, our algorithm uses graph neural networks incombination with an actor-critic algorithm (A2C) to build an adaptiverepresentation of the problem on the fly. We show that this approach is competitive with state-of-the-art heuristicsused in high-performance computing runtime systems. Moreover, our algorithmdoes not require an explicit model of the environment, but we demonstrate thatextra knowledge can easily be incorporated and improves performance. We alsoexhibit key properties provided by this RL approach, and study its transferabilities to other instances.