Modeling Selective Feature Attention for Representation-based Siamese Text Matching

Abstract

Representation-based Siamese networks have risen to popularity in lightweighttext matching due to their low deployment and inference costs. While word-levelattention mechanisms have been implemented within Siamese networks to improveperformance, we propose Feature Attention (FA), a novel downstream blockdesigned to enrich the modeling of dependencies among embedding features.Employing "squeeze-and-excitation" techniques, the FA block dynamically adjuststhe emphasis on individual features, enabling the network to concentrate moreon features that significantly contribute to the final classification. Buildingupon FA, we introduce a dynamic "selection" mechanism called Selective FeatureAttention (SFA), which leverages a stacked BiGRU Inception structure. The SFAblock facilitates multi-scale semantic extraction by traversing differentstacked BiGRU layers, encouraging the network to selectively concentrate onsemantic information and embedding features across varying levels ofabstraction. Both the FA and SFA blocks offer a seamless integration capabilitywith various Siamese networks, showcasing a plug-and-play characteristic.Experimental evaluations conducted across diverse text matching baselines andbenchmarks underscore the indispensability of modeling feature attention andthe superiority of the "selection" mechanism.

Quick Read (beta)

loading the full paper ...