DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks

  • 2019-07-25 14:03:06
  • Lin Zehui, Pengfei Liu, Luyao Huang, Jie Fu, Junkun Chen, Xipeng Qiu, Xuanjing Huang
Variants dropout methods have been designed for the fully-connected layer,convolutional layer and recurrent layer in neural networks, and shown to beeffective to avoid overfitting. As an appealing alternative to recurrent andconvolutional layers, the fully-connected self-attention layer surprisinglylacks a specific dropout method. This paper explores the possibility ofregularizing the attention weights in Transformers to prevent differentcontextualized feature vectors from co-adaption. Experiments on a wide range oftasks show that DropAttention can improve performance and reduce overfitting.


