distilabel-internal-testing/embeddings-dataset-semantically-similar
收藏数据集概述
数据集基本信息
- 名称: embeddings-dataset-paraphrase
- 创建工具: distilabel
- 大小分类: 小于1000条记录
- 标签:
- synthetic
- distilabel
- rlaif
数据集结构
- 配置: 默认
- 数据示例结构: json { "anchor": "Astrology: I am a Capricorn Sun Cap moon and cap rising...what does that say about me?", "distilabel_metadata": { "raw_output_paraphrase": "## Positive
With a triple Capricorn influence, youre likely a driven and ambitious individual with a strong sense of discipline and responsibility.
Negative
The cap on my pen is always getting lost, and its really frustrating when I need to sign important documents." }, "model_name": "meta-llama/Meta-Llama-3-70B-Instruct", "negative": "The cap on my pen is always getting lost, and its really frustrating when I need to sign important documents.", "positive": "With a triple Capricorn influence, youre likely a driven and ambitious individual with a strong sense of discipline and responsibility." }
数据集加载
-
加载方式: python from datasets import load_dataset
ds = load_dataset("distilabel-internal-testing/embeddings-dataset-paraphrase", "default")
或简化为: python from datasets import load_dataset
ds = load_dataset("distilabel-internal-testing/embeddings-dataset-paraphrase")



