Abstract
Since their introduction, Transformer-based models, such as SASRec andBERT4Rec, have become common baselines for sequential recommendations,surpassing earlier neural and non-neural methods. A number of followingpublications have shown that the effectiveness of these models can be improvedby, for example, slightly updating the architecture of the Transformer layers,using better training objectives, and employing improved loss functions.However, the additivity of these modular improvements has not beensystematically benchmarked - this is the gap we aim to close in this paper.Through our experiments, we identify a very strong model that uses SASRec'straining objective, LiGR Transformer layers, and Sampled Softmax Loss. We callthis combination eSASRec (Enhanced SASRec). While we primarily focus onrealistic, production-like evaluation, in our preliminarily study we find thatcommon academic benchmarks show eSASRec to be 23% more effective compared tothe most recent state-of-the-art models, such as ActionPiece. In our mainproduction-like benchmark, eSASRec resides on the Pareto frontier in terms ofthe accuracy-coverage tradeoff (alongside the recent industrial models HSTU andFuXi. As the modifications compared to the original SASRec are relativelystraightforward and no extra features are needed (such as timestamps in HSTU),we believe that eSASRec can be easily integrated into existing recommendationpipelines and can can serve as a strong yet very simple baseline for emergingcomplicated algorithms. To facilitate this, we provide the open-sourceimplementations for our models and benchmarks in repositoryhttps://github.com/blondered/transformer_benchmark