CTR-Sink: Attention Sink for Language Models in Click-Through Rate Prediction

Abstract

Click-Through Rate (CTR) prediction, a core task in recommendation systems,estimates user click likelihood using historical behavioral data. Modeling userbehavior sequences as text to leverage Language Models (LMs) for this task hasgained traction, owing to LMs' strong semantic understanding and contextualmodeling capabilities. However, a critical structural gap exists: user behaviorsequences consist of discrete actions connected by semantically emptyseparators, differing fundamentally from the coherent natural language in LMpre-training. This mismatch causes semantic fragmentation, where LM attentionscatters across irrelevant tokens instead of focusing on meaningful behaviorboundaries and inter-behavior relationships, degrading prediction performance.To address this, we propose $\textit{CTR-Sink}$, a novel framework introducingbehavior-level attention sinks tailored for recommendation scenarios. Inspiredby attention sink theory, it constructs attention focus sinks and dynamicallyregulates attention aggregation via external information. Specifically, weinsert sink tokens between consecutive behaviors, incorporatingrecommendation-specific signals such as temporal distance to serve as stableattention sinks. To enhance generality, we design a two-stage training strategythat explicitly guides LM attention toward sink tokens and a attention sinkmechanism that amplifies inter-sink dependencies to better capture behavioralcorrelations. Experiments on one industrial dataset and two open-sourcedatasets (MovieLens, Kuairec), alongside visualization results, validate themethod's effectiveness across scenarios.

Quick Read (beta)

loading the full paper ...