Neural Attention: Enhancing QKV Calculation in Self-Attention Mechanism with Neural Networks

Abstract

In the realm of deep learning, the self-attention mechanism has substantiatedits pivotal role across a myriad of tasks, encompassing natural languageprocessing and computer vision. Despite achieving success across diverseapplications, the traditional self-attention mechanism primarily leverageslinear transformations for the computation of query, key, and value (QKV),which may not invariably be the optimal choice under specific circumstances.This paper probes into a novel methodology for QKV computation-implementing aspecially-designed neural network structure for the calculation. Utilizing amodified Marian model, we conducted experiments on the IWSLT 2017German-English translation task dataset and juxtaposed our method with theconventional approach. The experimental results unveil a significantenhancement in BLEU scores with our method. Furthermore, our approach alsomanifested superiority when training the Roberta model with the Wikitext-103dataset, reflecting a notable reduction in model perplexity compared to itsoriginal counterpart. These experimental outcomes not only validate theefficacy of our method but also reveal the immense potential in optimizing theself-attention mechanism through neural network-based QKV computation, pavingthe way for future research and practical applications. The source code andimplementation details for our proposed method can be accessed athttps://github.com/ocislyjrti/NeuralAttention.

Quick Read (beta)

loading the full paper ...