Abstract
Representation-based Siamese networks have risen to popularity in lightweighttext matching due to their low deployment and inference costs. While word-levelattention mechanisms have been implemented within Siamese networks to improveperformance, we propose Feature Attention (FA), a novel downstream blockdesigned to enrich the modeling of dependencies among embedding features.Employing "squeeze-and-excitation" techniques, the FA block dynamically adjuststhe emphasis on individual features, enabling the network to concentrate moreon features that significantly contribute to the final classification. Buildingupon FA, we introduce a dynamic "selection" mechanism called Selective FeatureAttention (SFA), which leverages a stacked BiGRU Inception structure. The SFAblock facilitates multi-scale semantic extraction by traversing differentstacked BiGRU layers, encouraging the network to selectively concentrate onsemantic information and embedding features across varying levels ofabstraction. Both the FA and SFA blocks offer a seamless integration capabilitywith various Siamese networks, showcasing a plug-and-play characteristic.Experimental evaluations conducted across diverse text matching baselines andbenchmarks underscore the indispensability of modeling feature attention andthe superiority of the "selection" mechanism.