five

Text-to-SQL Verification Methods and Benchmark

收藏
DataCite Commons2025-08-22 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/LLM-Generated_Text-to-SQL_Verification_Methods_and_Benchmark/29896328/2
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains the source code, evaluation results, and a new verification benchmark developed for our research on LLM-generated SQL verification.We propose and evaluate two novel LLM-based SQL verification methods. To advance research in this area, we have constructed a dedicated LLM-generated text-to-SQL verification benchmark. Unlike traditional generation-focused benchmarks, which only contain "gold" SQL queries, our benchmark provides a mix of labeled correct and incorrect SQL queries for a given natural language question. This allows for a more comprehensive evaluation of verification algorithms, enabling the measurement of both False Acceptance Rate (FAR) and False Rejection Rate (FRR).This benchmark is derived from the development sets of three popular Text-to-SQL generation benchmarks: BIRD, Spider, and KaggleDBQA. We hope that by providing this dataset, we can contribute to the ongoing improvement of Text-to-SQL verification systems. The benchmark data includes the original natural language questions, database schemas, and our generated candidate SQL queries, each labeled as either correct or incorrect based on execution-based ground truth.We are also sharing the complete source code for our proposed SQL verification methods, as well as the implementation for the SQL Critique baseline method, which is commonly used in recent Text-to-SQL generation pipelines. The code allows for the reproduction of our verification experiments and the evaluation of other verification algorithms on our new benchmark.
提供机构:
figshare
创建时间:
2025-08-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作