OpenDCAI/dataflow-demo-Text2SQL

Name: OpenDCAI/dataflow-demo-Text2SQL
Creator: OpenDCAI
Published: 2026-02-04 13:41:48
License: 暂无描述

Hugging Face2026-02-04 更新2026-01-03 收录

下载链接：

https://hf-mirror.com/datasets/OpenDCAI/dataflow-demo-Text2SQL

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集来自DataFlow项目，包含多个JSON分割，展示了常见的文本到SQL（Text-to-SQL）训练数据格式，涵盖原始输入、精炼输出和增强样本。数据集可用于训练和增强大型语言模型的Text-to-SQL生成能力，提高其在Text-to-SQL任务上的泛化性能。数据集的分割包括input_example（400条记录，展示种子数据格式）、output_example（1368条记录，展示增强后的数据格式）、sqlflow_bird（37521条记录，源自Bird训练数据集）、sqlflow_ehrsql（14491条记录，源自EHRSQL训练数据集）和sqlflow_spider（37537条记录，源自Spider训练数据集）。数据集的字段包括数据库标识符（db_id）、自然语言问题（question）、SQL查询（sql）、推理跟踪（cot）、外部知识（external_knowledge）、完整提示上下文（prompt）、问题风格标签（question_style）以及难度注释（sql_component_difficulty和sql_execution_difficulty）等。

This dataset is part of the DataFlow project and includes multiple JSON splits showcasing common Text-to-SQL training data formats, covering raw inputs, refined outputs, and augmented samples. The dataset can be used to train and enhance large language models Text-to-SQL generation capabilities, improving their generalization performance on Text-to-SQL tasks. The splits include input_example (400 records, demonstrating seed data format), output_example (1368 records, demonstrating augmented data format), sqlflow_bird (37521 records, derived from the Bird training dataset), sqlflow_ehrsql (14491 records, derived from the EHRSQL training dataset), and sqlflow_spider (37537 records, derived from the Spider training dataset). The datasets fields include database identifier (db_id), natural language question (question), SQL query (sql), reasoning trace (cot), external knowledge (external_knowledge), full prompt context (prompt), question style tag (question_style), and difficulty annotations (sql_component_difficulty and sql_execution_difficulty), among others.

提供机构：

OpenDCAI

5,000+

优质数据集

54 个

任务类型

进入经典数据集