Breaking the Performance Ceiling in Reinforcement Learning requires Inference Strategies

Abstract

Reinforcement learning (RL) systems have countless applications, fromenergy-grid management to protein design. However, such real-world scenariosare often extremely difficult, combinatorial in nature, and require complexcoordination between multiple agents. This level of complexity can cause evenstate-of-the-art RL systems, trained until convergence, to hit a performanceceiling which they are unable to break out of with zero-shot inference.Meanwhile, many digital or simulation-based applications allow for an inferencephase that utilises a specific time and compute budget to explore multipleattempts before outputting a final solution. In this work, we show that such aninference phase employed at execution time, and the choice of a correspondinginference strategy, are key to breaking the performance ceiling observed incomplex multi-agent RL problems. Our main result is striking: we can obtain upto a 126% and, on average, a 45% improvement over the previous state-of-the-artacross 17 tasks, using only a couple seconds of extra wall-clock time duringexecution. We also demonstrate promising compute scaling properties, supportedby over 60k experiments, making it the largest study on inference strategiesfor complex RL to date. Our experimental data and code are available athttps://sites.google.com/view/inference-strategies-rl.

Quick Read (beta)

loading the full paper ...