davanstrien/similarity-dataset-test-sql

Name: davanstrien/similarity-dataset-test-sql
Creator: davanstrien
Published: 2024-05-25 16:20:30
License: 暂无描述

Hugging Face2024-05-25 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/davanstrien/similarity-dataset-test-sql

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是通过distilabel工具生成的，包含一个`pipeline.yaml`文件，可以用于复现生成该数据集的流程。数据集的结构包括四个主要字段：`anchor`、`positive`、`negative`和`generation`，分别表示锚点文本、正例文本、负例文本和生成文本。数据集只有一个配置`default`，并且可以通过Hugging Face的`load_dataset`函数加载。

提供机构：

davanstrien

原始信息汇总

数据集卡片 for similarity-dataset-test-sql

数据集概述

该数据集包含一个 pipeline.yaml 文件，可用于在 distilabel 中重现生成该数据集的管道：

console distilabel pipeline run --config "https://huggingface.co/datasets/davanstrien/similarity-dataset-test-sql/raw/main/pipeline.yaml"

或者探索配置：

console distilabel pipeline info --config "https://huggingface.co/datasets/davanstrien/similarity-dataset-test-sql/raw/main/pipeline.yaml"

数据集结构

每个配置的示例具有以下结构：

<details><summary> 配置: default </summary><hr>

json { "anchor": "What is the total volume of timber sold by each salesperson, sorted by salesperson?", "generation": "{ "bad": [ "What is the average volume of timber sold by each salesperson, sorted by salesperson?", "What is the total volume of timber sold by each salesperson, sorted by volume?", "What is the total volume of timber sold by each salesperson, without sorting?" ], "good": [ "What is the total quantity of timber sold by each salesperson, ordered by salesperson?", "List the total amount of timber sold by each salesperson, sorted by salesperson.", "Provide the total volume of timber sold by each salesperson, arranged by salesperson." ] }", "negative": "What is the average volume of timber sold by each salesperson, sorted by salesperson?", "positive": "What is the total quantity of timber sold by each salesperson, ordered by salesperson?" }

该子集可以加载为：

python from datasets import load_dataset

ds = load_dataset("davanstrien/similarity-dataset-test-sql", "default")

或者简单地加载，因为只有一个配置且名为 default：

python from datasets import load_dataset

ds = load_dataset("davanstrien/similarity-dataset-test-sql")

</details>

5,000+

优质数据集

54 个

任务类型

进入经典数据集