DBCopilot: Scaling Natural Language Querying to Massive Databases

  • 2023-12-06 12:37:28
  • Tianshu Wang, Hongyu Lin, Xianpei Han, Le Sun, Xiaoyang Chen, Hao Wang, Zhenyu Zeng
  • 0

Abstract

Text-to-SQL simplifies database interactions by enabling non-experts toconvert their natural language (NL) questions into Structured Query Language(SQL) queries. While recent advances in large language models (LLMs) haveimproved the zero-shot text-to-SQL paradigm, existing methods face scalabilitychallenges when dealing with massive, dynamically changing databases. Thispaper introduces DBCopilot, a framework that addresses these challenges byemploying a compact and flexible copilot model for routing across massivedatabases. Specifically, DBCopilot decouples the text-to-SQL process intoschema routing and SQL generation, leveraging a lightweightsequence-to-sequence neural network-based router to formulate databaseconnections and navigate natural language questions through databases andtables. The routed schemas and questions are then fed into LLMs for efficientSQL generation. Furthermore, DBCopilot also introduced a reverseschema-to-question generation paradigm, which can learn and adapt the routerover massive databases automatically without requiring manual intervention.Experimental results demonstrate that DBCopilot is a scalable and effectivesolution for real-world text-to-SQL tasks, providing a significant advancementin handling large-scale schemas.

 

Quick Read (beta)

loading the full paper ...