LLM-Generated Text-to-SQL Verification Methods and Benchmark
收藏DataCite Commons2025-08-22 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/LLM-Generated_Text-to-SQL_Verification_Methods_and_Benchmark/29896328/3
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains the source code, evaluation results, and a new verification benchmark developed for our research on LLM-generated SQL verification.We propose and evaluate two novel LLM-based SQL verification methods. To advance research in this area, we have constructed a dedicated LLM-generated text-to-SQL verification benchmark. Unlike traditional generation-focused benchmarks, which only contain "gold" SQL queries, our benchmark provides a mix of labeled correct and incorrect SQL queries for a given natural language question. This allows for a more comprehensive evaluation of verification algorithms, enabling the measurement of both False Acceptance Rate (FAR) and False Rejection Rate (FRR).This benchmark is derived from the development sets of three popular Text-to-SQL generation benchmarks: BIRD, Spider, and KaggleDBQA. We hope that by providing this dataset, we can contribute to the ongoing improvement of Text-to-SQL verification systems. The benchmark data includes the original natural language questions, database schemas, and our generated candidate SQL queries, each labeled as either correct or incorrect based on execution-based ground truth.We are also sharing the complete source code for our proposed SQL verification methods, as well as the implementation for the SQL Critique baseline method, which is commonly used in recent Text-to-SQL generation pipelines. The code allows for the reproduction of our verification experiments and the evaluation of other verification algorithms on our new benchmark.
提供机构:
figshare
创建时间:
2025-08-22



