Abstract
Reinforcement Learning (RL) algorithms often struggle with low trainingefficiency. A common approach to address this challenge is integratingmodel-based planning algorithms, such as Monte Carlo Tree Search (MCTS) orValue Iteration (VI), into the environmental model. However, VI requiresiterating over a large tensor which updates the value of the preceding statebased on the succeeding state through value propagation, resulting incomputationally intensive operations. To enhance the RL training efficiency, wepropose improving the efficiency of the value learning process. Indeterministic environments with discrete state and action spaces, we observethat on the sampled empirical state-transition graph, a non-branching sequenceof transitions-termed a highway-can take the agent to another state withoutdeviation through intermediate states. On these non-branching highways, thevalue-updating process can be streamlined into a single-step operation,eliminating the need for step-by-step updates. Building on this observation, weintroduce the highway graph to model state transitions. The highway graphcompresses the transition model into a compact representation, where edges canencapsulate multiple state transitions, enabling value propagation acrossmultiple time steps in a single iteration. By integrating the highway graphinto RL, the training process is significantly accelerated, particularly in theearly stages of training. Experiments across four categories of environmentsdemonstrate that our method learns significantly faster than established andstate-of-the-art RL algorithms (often by a factor of 10 to 150) whilemaintaining equal or superior expected returns. Furthermore, a deep neuralnetwork-based agent trained using the highway graph exhibits improvedgeneralization capabilities and reduced storage costs. Code is publiclyavailable at https://github.com/coodest/highwayRL.