Abstract
Generative transformer models have become increasingly complex, with largenumbers of parameters and the ability to process multiple input modalities.Current methods for explaining their predictions are resource-intensive. Mostcrucially, they require prohibitively large amounts of extra memory, since theyrely on backpropagation which allocates almost twice as much GPU memory as theforward pass. This makes it difficult, if not impossible, to use them inproduction. We present AtMan that provides explanations of generativetransformer models at almost no extra cost. Specifically, AtMan is amodality-agnostic perturbation method that manipulates the attention mechanismsof transformers to produce relevance maps for the input with respect to theoutput prediction. Instead of using backpropagation, AtMan applies aparallelizable token-based search method based on cosine similarityneighborhood in the embedding space. Our exhaustive experiments on text andimage-text benchmarks demonstrate that AtMan outperforms currentstate-of-the-art gradient-based methods on several metrics while beingcomputationally efficient. As such, AtMan is suitable for use in large modelinference deployments.