Abstract
Modeling multi-interests has arisen as a core problem in real-world RS.Current multi-interest retrieval methods pose three major challenges: 1)Interests, typically extracted from predefined external knowledge, areinvariant. Failed to dynamically evolve with users' real-time consumptionpreferences. 2) Online inference typically employs an over-exploited strategy,mainly matching users' existing interests, lacking proactive exploration anddiscovery of novel and long-tail interests. To address these challenges, wepropose a novel retrieval framework named SPARC(Soft Probabilistic AdaptiveRetrieval Model via Codebooks). Our contribution is two folds. First, theframework utilizes Residual Quantized Variational Autoencoder (RQ-VAE) toconstruct a discretized interest space. It achieves joint training of theRQ-VAE with the industrial large scale recommendation model, miningbehavior-aware interests that can perceive user feedback and evolvedynamically. Secondly, a probabilistic interest module that predicts theprobability distribution over the entire dynamic and discrete interest space.This facilitates an efficient "soft-search" strategy during online inference,revolutionizing the retrieval paradigm from "passive matching" to "proactiveexploration" and thereby effectively promoting interest discovery. Online A/Btests on an industrial platform with tens of millions daily active users, haveachieved substantial gains in business metrics: +0.9% increase in user viewduration, +0.4% increase in user page views (PV), and a +22.7% improvement inPV500(new content reaching 500 PVs in 24 hours). Offline evaluations areconducted on open-source Amazon Product datasets. Metrics, such as Recall@K andNormalized Discounted Cumulative Gain@K(NDCG@K), also showed consistentimprovement. Both online and offline experiments validate the efficacy andpractical value of the proposed method.