Abstract
Retrieval plays a fundamental role in recommendation systems, search, andnatural language processing (NLP) by efficiently finding relevant items from alarge corpus given a query. Dot products have been widely used as thesimilarity function in such tasks, enabled by Maximum Inner Product Search(MIPS) algorithms for efficient retrieval. However, state-of-the-art retrievalalgorithms have migrated to learned similarities. These advanced approachesencompass multiple query embeddings, complex neural networks, direct item IDdecoding via beam search, and hybrid solutions. Unfortunately, we lackefficient solutions for retrieval in these state-of-the-art setups. Our workaddresses this gap by investigating efficient retrieval techniques withexpressive learned similarity functions. We establish Mixture-of-Logits (MoL)as a universal approximator of similarity functions, demonstrate that MoL'sexpressiveness can be realized empirically to achieve superior performance ondiverse retrieval scenarios, and propose techniques to retrieve the approximatetop-k results using MoL with tight error bounds. Through extensiveexperimentation, we show that MoL, enhanced by our proposed mutualinformation-based load balancing loss, sets new state-of-the-art results acrossheterogeneous scenarios, including sequential retrieval models inrecommendation systems and finetuning language models for question answering;and our approximate top-$k$ algorithms outperform baselines by up to 66x inlatency while achieving >.99 recall rate compared to exact algorithms.