Exploring the Role of Token in Transformer-based Time Series Forecasting

Abstract

Transformer-based methods are a mainstream approach for solving time seriesforecasting (TSF). These methods use temporal or variable tokens fromobservable data to make predictions. However, most focus on optimizing themodel structure, with few studies paying attention to the role of tokens forpredictions. The role is crucial since a model that distinguishes useful tokensfrom useless ones will predict more effectively. In this paper, we explore thisissue. Through theoretical analyses, we find that the gradients mainly dependon tokens that contribute to the predicted series, called positive tokens.Based on this finding, we explore what helps models select these positivetokens. Through a series of experiments, we obtain three observations: i)positional encoding (PE) helps the model identify positive tokens; ii) as thenetwork depth increases, the PE information gradually weakens, affecting themodel's ability to identify positive tokens in deeper layers; iii) bothenhancing PE in the deeper layers and using semantic-based PE can improve themodel's ability to identify positive tokens, thus boosting performance.Inspired by these findings, we design temporal positional encoding (T-PE) fortemporal tokens and variable positional encoding (V-PE) for variable tokens. Toutilize T-PE and V-PE, we propose T2B-PE, a Transformer-based dual-branchframework. Extensive experiments demonstrate that T2B-PE has superiorrobustness and effectiveness.

Quick Read (beta)

loading the full paper ...