Abstract
Vector embeddings have been tasked with an ever-increasing set of retrievaltasks over the years, with a nascent rise in using them for reasoning,instruction-following, coding, and more. These new benchmarks push embeddingsto work for any query and any notion of relevance that could be given. Whileprior works have pointed out theoretical limitations of vector embeddings,there is a common assumption that these difficulties are exclusively due tounrealistic queries, and those that are not can be overcome with bettertraining data and larger models. In this work, we demonstrate that we mayencounter these theoretical limitations in realistic settings with extremelysimple queries. We connect known results in learning theory, showing that thenumber of top-k subsets of documents capable of being returned as the result ofsome query is limited by the dimension of the embedding. We empirically showthat this holds true even if we restrict to k=2, and directly optimize on thetest set with free parameterized embeddings. We then create a realistic datasetcalled LIMIT that stress tests models based on these theoretical results, andobserve that even state-of-the-art models fail on this dataset despite thesimple nature of the task. Our work shows the limits of embedding models underthe existing single vector paradigm and calls for future research to developmethods that can resolve this fundamental limitation.