Discovering Data Structures: Nearest Neighbor Search and Beyond

Abstract

We propose a general framework for end-to-end learning of data structures.Our framework adapts to the underlying data distribution and providesfine-grained control over query and space complexity. Crucially, the datastructure is learned from scratch, and does not require careful initializationor seeding with candidate data structures/algorithms. We first apply thisframework to the problem of nearest neighbor search. In several settings, weare able to reverse-engineer the learned data structures and query algorithms.For 1D nearest neighbor search, the model discovers optimal distribution(in)dependent algorithms such as binary search and variants of interpolationsearch. In higher dimensions, the model learns solutions that resemble k-dtrees in some regimes, while in others, they have elements oflocality-sensitive hashing. The model can also learn useful representations ofhigh-dimensional data and exploit them to design effective data structures. Wealso adapt our framework to the problem of estimating frequencies over a datastream, and believe it could also be a powerful discovery tool for newproblems.

Quick Read (beta)

loading the full paper ...