Abstract
Reinforcement Learning (RL) can be considered as a sequence modeling task,i.e., given a sequence of past state-action-reward experiences, a modelautoregressively predicts a sequence of future actions. Recently, Transformershave been successfully adopted to model this problem. In this work, we proposeState-Action-Reward Transformer (StARformer), which explicitly models localcausal relations to help improve action prediction in long sequences.StARformer first extracts local representations (i.e., StAR-representations)from each group of state-action-reward tokens within a very short time span. Asequence of such local representations combined with state representations, isthen used to make action predictions over a long time span. Our experimentsshow that StARformer outperforms the state-of-the-art Transformer-based methodon Atari (image) and Gym (state vector) benchmarks, in both offline-RL andimitation learning settings. StARformer is also more compliant with longersequences of inputs compared to the baseline. Our code is available athttps://github.com/elicassion/StARformer.