Abstract
Reinforcement Learning (RL) has been widely used for packet routing incommunication networks, but traditional RL methods rely on the Markovassumption that the current state contains all necessary information fordecision-making. In reality, internet traffic is non-Markovian, and past statesdo influence routing performance. Moreover, common deep RL approaches usefunction approximators, such as neural networks, that do not model the spatialstructure in network topologies. To address these shortcomings, we design anetwork environment with non-Markovian traffic and introduce a spatial-temporalRL (STRL) framework for packet routing. Our approach outperforms traditionalbaselines by more than 19% during training and 7% for inference despite achange in network topology.