five

motherduckdb/pg-docbench

收藏
Hugging Face2026-01-13 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/motherduckdb/pg-docbench
下载链接
链接失效反馈
官方服务:
资源简介:
DocBench是一个合成的Text-to-SQL基准数据集,包含4398个问题/SQL对,源自PostgreSQL文档,专门设计用于探测语言模型对PostgreSQL特定SQL功能的了解。数据集涵盖了PostgreSQL 14和PostgreSQL 18中的函数、聚合、操作符、语句、关键字和多关键字表达式,包括扩展如pgvector。每个示例包含类别(难度级别)、自然语言问题/指令、真实SQL查询、设置SQL、验证SQL、测试的SQL构造类型、特定功能/特性的名称、PostgreSQL版本和附加元数据。数据集文件包括基准数据集(pg-synth.jsonl)和每个构造的蒸馏文档(pg-distilled-docs.jsonl)。

DocBench is a synthetic Text-to-SQL benchmark dataset consisting of 4398 question/sql pairs derived from the PostgreSQL documentation, specifically designed to probe language models for knowledge of PostgreSQL-specific SQL functionality. The dataset covers functions, aggregates, operators, statements, keywords, and multi-keyword expressions available in PostgreSQL 14 and PostgreSQL 18, including extensions like pgvector. Each example contains: category (difficulty level), natural language question/instruction, ground truth SQL query, setup SQL, validation SQL, type of SQL construct being tested, name of the specific function/feature, PostgreSQL version, and additional metadata. The dataset files include the benchmark dataset (pg-synth.jsonl) and distilled documentation for each construct (pg-distilled-docs.jsonl).
提供机构:
motherduckdb
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作