davanstrien/similarity-dataset-test-sql
收藏数据集卡片 for similarity-dataset-test-sql
数据集概述
该数据集包含一个 pipeline.yaml 文件,可用于在 distilabel 中重现生成该数据集的管道:
console distilabel pipeline run --config "https://huggingface.co/datasets/davanstrien/similarity-dataset-test-sql/raw/main/pipeline.yaml"
或者探索配置:
console distilabel pipeline info --config "https://huggingface.co/datasets/davanstrien/similarity-dataset-test-sql/raw/main/pipeline.yaml"
数据集结构
每个配置的示例具有以下结构:
<details><summary> 配置: default </summary><hr>
json { "anchor": "What is the total volume of timber sold by each salesperson, sorted by salesperson?", "generation": "{ "bad": [ "What is the average volume of timber sold by each salesperson, sorted by salesperson?", "What is the total volume of timber sold by each salesperson, sorted by volume?", "What is the total volume of timber sold by each salesperson, without sorting?" ], "good": [ "What is the total quantity of timber sold by each salesperson, ordered by salesperson?", "List the total amount of timber sold by each salesperson, sorted by salesperson.", "Provide the total volume of timber sold by each salesperson, arranged by salesperson." ] }", "negative": "What is the average volume of timber sold by each salesperson, sorted by salesperson?", "positive": "What is the total quantity of timber sold by each salesperson, ordered by salesperson?" }
该子集可以加载为:
python from datasets import load_dataset
ds = load_dataset("davanstrien/similarity-dataset-test-sql", "default")
或者简单地加载,因为只有一个配置且名为 default:
python from datasets import load_dataset
ds = load_dataset("davanstrien/similarity-dataset-test-sql")
</details>



