Abstract
In the context of the Text-to-SQL task, table and column descriptions arecrucial for bridging the gap between natural language and database schema. Thisreport proposes a method for automatically generating effective databasedescriptions when explicit descriptions are unavailable. The proposed methodemploys a dual-process approach: a coarse-to-fine process, followed by afine-to-coarse process. The coarse-to-fine approach leverages the inherentknowledge of LLM to guide the understanding process from databases to tablesand finally to columns. This approach provides a holistic understanding of thedatabase structure and ensures contextual alignment. Conversely, thefine-to-coarse approach starts at the column level, offering a more accurateand nuanced understanding when stepping back to the table level. Experimentalresults on the Bird benchmark indicate that using descriptions generated by theproposed improves SQL generation accuracy by 0.93\% compared to not usingdescriptions, and achieves 37\% of human-level performance. The source code ispublicly available at https://github.com/XGenerationLab/XiYan-DBDescGen.