Abstract
This paper introduces LiNR, LinkedIn's large-scale, GPU-based retrievalsystem. LiNR supports a billion-sized index on GPU models. We discuss ourexperiences and challenges in creating scalable, differentiable search indexesusing TensorFlow and PyTorch at production scale. In LiNR, both items and modelweights are integrated into the model binary. Viewing index construction as aform of model training, we describe scaling our system for large indexes,incorporating full scans and efficient filtering. A key focus is on enablingattribute-based pre-filtering for exhaustive GPU searches, addressing thecommon challenge of post-filtering in KNN searches that often reduces systemquality. We further provide multi-embedding retrieval algorithms and strategiesfor tackling cold start issues in retrieval. Our advancements in supportinglarger indexes through quantization are also discussed. We believe LiNRrepresents one of the industry's first Live-updated model-based retrievalindexes. Applied to out-of-network post recommendations on LinkedIn Feed, LiNRhas contributed to a 3% relative increase in professional daily active users.We envisage LiNR as a step towards integrating retrieval and ranking into asingle GPU model, simplifying complex infrastructures and enabling end-to-endoptimization of the entire differentiable infrastructure through gradientdescent.