The Case for Learned Index Structures

Abstract

Indexes are models: a B-Tree-Index can be seen as a model to map a key to theposition of a record within a sorted array, a Hash-Index as a model to map akey to a position of a record within an unsorted array, and a BitMap-Index as amodel to indicate if a data record exists or not. In this exploratory researchpaper, we start from this premise and posit that all existing index structurescan be replaced with other types of models, including deep-learning models,which we term learned indexes. The key idea is that a model can learn the sortorder or structure of lookup keys and use this signal to effectively predictthe position or existence of records. We theoretically analyze under whichconditions learned indexes outperform traditional index structures and describethe main challenges in designing learned index structures. Our initial resultsshow, that by using neural nets we are able to outperform cache-optimizedB-Trees by up to 70% in speed while saving an order-of-magnitude in memory overseveral real-world data sets. More importantly though, we believe that the ideaof replacing core components of a data management system through learned modelshas far reaching implications for future systems designs and that this workjust provides a glimpse of what might be possible.

Quick Read (beta)

loading the full paper ...