distilabel-internal-testing/embeddings-dataset-paraphrase
收藏数据集概述
数据集名称
- 名称: embeddings-dataset-paraphrase
数据集创建工具
- 工具: distilabel
数据集大小
- 大小: n<1K
数据集标签
- 标签:
- synthetic
- distilabel
- rlaif
数据集结构
- 结构: 数据集包含一个
pipeline.yaml文件,用于在distilabel中重现生成此数据集的流程。
示例结构
- 配置: default
- 示例内容: json { "anchor": "Astrology: I am a Capricorn Sun Cap moon and cap rising...what does that say about me?", "distilabel_metadata": { "raw_output_generate_sentence_pair_0": "## Positive
As a triple Capricorn, with the Sun, Moon, and Rising sign all in Capricorn, this celestial alignment suggests that youu0027re a driven, ambitious, and responsible individual with a strong sense of discipline and perseverance.
Negative
The cap on my favorite pen has gone missing, and Iu0027m left struggling to find a suitable replacement." }, "model_name": "meta-llama/Meta-Llama-3-70B-Instruct", "negative": "The cap on my favorite pen has gone missing, and Iu0027m left struggling to find a suitable replacement.", "positive": "As a triple Capricorn, with the Sun, Moon, and Rising sign all in Capricorn, this celestial alignment suggests that youu0027re a driven, ambitious, and responsible individual with a strong sense of discipline and perseverance." }
数据集加载
-
加载方式: python from datasets import load_dataset
ds = load_dataset("distilabel-internal-testing/embeddings-dataset-paraphrase")



