CRED-SQL: Enhancing Real-world Large Scale Database Text-to-SQL Parsing through Cluster Retrieval and Execution Description

  • 2025-08-20 08:11:10
  • Shaoming Duan, Zirui Wang, Chuanyi Liu, Zhibin Zhu, Yuhao Zhang, Peiyi Han, Liang Yan, Zewu Peng
  • 0

Abstract

Recent advances in large language models (LLMs) have significantly improvedthe accuracy of Text-to-SQL systems. However, a critical challenge remains: thesemantic mismatch between natural language questions (NLQs) and theircorresponding SQL queries. This issue is exacerbated in large-scale databases,where semantically similar attributes hinder schema linking and semantic driftduring SQL generation, ultimately reducing model accuracy. To address thesechallenges, we introduce CRED-SQL, a framework designed for large-scaledatabases that integrates Cluster Retrieval and Execution Description. CRED-SQLfirst performs cluster-based large-scale schema retrieval to pinpoint thetables and columns most relevant to a given NLQ, alleviating schema mismatch.It then introduces an intermediate natural language representation-ExecutionDescription Language (EDL)-to bridge the gap between NLQs and SQL. Thisreformulation decomposes the task into two stages: Text-to-EDL and EDL-to-SQL,leveraging LLMs' strong general reasoning capabilities while reducing semanticdeviation. Extensive experiments on two large-scale, cross-domainbenchmarks-SpiderUnion and BirdUnion-demonstrate that CRED-SQL achieves newstate-of-the-art (SOTA) performance, validating its effectiveness andscalability. Our code is available at https://github.com/smduan/CRED-SQL.git

 

Quick Read (beta)

loading the full paper ...