Translating synthetic natural language to database queries: a polyglot deep learning framework

Abstract

The number of databases as well as their size and complexity is increasing.This creates a barrier to use especially for non-experts, who have to come togrips with the nature of the data, the way it has been represented in thedatabase, and the specific query languages or user interfaces by which data areaccessed. These difficulties worsen in research settings, where it is common towork with many different databases. One approach to improving this situation isto allow users to pose their queries in natural language. In this work we describe a machine learning framework, Polyglotter, that in ageneral way supports the mapping of natural language searches to databasequeries. Importantly, it does not require the creation of manually annotateddata for training and therefore can be applied easily to multiple domains. Theframework is polyglot in the sense that it supports multiple different databaseengines that are accessed with a variety of query languages, including SQL andCypher. Furthermore Polyglotter also supports multi-class queries. Our results indicate that our framework performs well on both synthetic andreal databases, and may provide opportunities for database maintainers toimprove accessibility to their resources.

Quick Read (beta)

loading the full paper ...