Automatic database description generation for Text-to-SQL

  • 2025-02-28 02:23:06
  • Yingqi Gao, Zhiling Luo
  • 0

Abstract

In the context of the Text-to-SQL task, table and column descriptions arecrucial for bridging the gap between natural language and database schema. Thisreport proposes a method for automatically generating effective databasedescriptions when explicit descriptions are unavailable. The proposed methodemploys a dual-process approach: a coarse-to-fine process, followed by afine-to-coarse process. The coarse-to-fine approach leverages the inherentknowledge of LLM to guide the understanding process from databases to tablesand finally to columns. This approach provides a holistic understanding of thedatabase structure and ensures contextual alignment. Conversely, thefine-to-coarse approach starts at the column level, offering a more accurateand nuanced understanding when stepping back to the table level. Experimentalresults on the Bird benchmark indicate that using descriptions generated by theproposed improves SQL generation accuracy by 0.93\% compared to not usingdescriptions, and achieves 37\% of human-level performance. The source code ispublicly available at https://github.com/XGenerationLab/XiYan-DBDescGen.

 

Quick Read (beta)

loading the full paper ...