five

davanstrien/similarity-dataset-test-sql

收藏
Hugging Face2024-05-25 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/davanstrien/similarity-dataset-test-sql
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是通过distilabel工具生成的,包含一个`pipeline.yaml`文件,可以用于复现生成该数据集的流程。数据集的结构包括四个主要字段:`anchor`、`positive`、`negative`和`generation`,分别表示锚点文本、正例文本、负例文本和生成文本。数据集只有一个配置`default`,并且可以通过Hugging Face的`load_dataset`函数加载。

该数据集是通过distilabel工具生成的,包含一个`pipeline.yaml`文件,可以用于复现生成该数据集的流程。数据集的结构包括四个主要字段:`anchor`、`positive`、`negative`和`generation`,分别表示锚点文本、正例文本、负例文本和生成文本。数据集只有一个配置`default`,并且可以通过Hugging Face的`load_dataset`函数加载。
提供机构:
davanstrien
原始信息汇总

数据集卡片 for similarity-dataset-test-sql

数据集概述

该数据集包含一个 pipeline.yaml 文件,可用于在 distilabel 中重现生成该数据集的管道:

console distilabel pipeline run --config "https://huggingface.co/datasets/davanstrien/similarity-dataset-test-sql/raw/main/pipeline.yaml"

或者探索配置:

console distilabel pipeline info --config "https://huggingface.co/datasets/davanstrien/similarity-dataset-test-sql/raw/main/pipeline.yaml"

数据集结构

每个配置的示例具有以下结构:

<details><summary> 配置: default </summary><hr>

json { "anchor": "What is the total volume of timber sold by each salesperson, sorted by salesperson?", "generation": "{ "bad": [ "What is the average volume of timber sold by each salesperson, sorted by salesperson?", "What is the total volume of timber sold by each salesperson, sorted by volume?", "What is the total volume of timber sold by each salesperson, without sorting?" ], "good": [ "What is the total quantity of timber sold by each salesperson, ordered by salesperson?", "List the total amount of timber sold by each salesperson, sorted by salesperson.", "Provide the total volume of timber sold by each salesperson, arranged by salesperson." ] }", "negative": "What is the average volume of timber sold by each salesperson, sorted by salesperson?", "positive": "What is the total quantity of timber sold by each salesperson, ordered by salesperson?" }

该子集可以加载为:

python from datasets import load_dataset

ds = load_dataset("davanstrien/similarity-dataset-test-sql", "default")

或者简单地加载,因为只有一个配置且名为 default

python from datasets import load_dataset

ds = load_dataset("davanstrien/similarity-dataset-test-sql")

</details>

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作