Transformers with Sparse Attention for Granger Causality

Abstract

Temporal causal analysis means understanding the underlying causes behindobserved variables over time. Deep learning based methods such as transformersare increasingly used to capture temporal dynamics and causal relationshipsbeyond mere correlations. Recent works suggest self-attention weights oftransformers as a useful indicator of causal links. We leverage this to proposea novel modification to the self-attention module to establish causal linksbetween the variables of multivariate time-series data with varying lagdependencies. Our Sparse Attention Transformer captures causal relationshipsusing a two-fold approach - performing temporal attention first followed byattention between the variables across the time steps masking them individuallyto compute Granger Causality indices. The key novelty in our approach is theability of the model to assert importance and pick the most significant pasttime instances for its prediction task against manually feeding a fixed timelag value. We demonstrate the effectiveness of our approach via extensiveexperimentation on several synthetic benchmark datasets. Furthermore, wecompare the performance of our model with the traditional Vector Autoregressionbased Granger Causality method that assumes fixed lag length.

Quick Read (beta)

loading the full paper ...