argilla-warehouse/magpie-ultra-v0.2-test-refined-not-verbose
收藏Hugging Face2024-09-12 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/argilla-warehouse/magpie-ultra-v0.2-test-refined-not-verbose
下载链接
链接失效反馈官方服务:
资源简介:
---
size_categories: n<1K
dataset_info:
features:
- name: instruction
dtype: string
- name: response
dtype: string
- name: model_name
dtype: string
splits:
- name: train
num_bytes: 19163203
num_examples: 10000
download_size: 10667832
dataset_size: 19163203
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
tags:
- synthetic
- distilabel
- rlaif
---
<p align="left">
<a href="https://github.com/argilla-io/distilabel">
<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>
</a>
</p>
# Dataset Card for magpie-ultra-v0.2-test-refined-not-verbose
This dataset has been created with [distilabel](https://distilabel.argilla.io/).
## Dataset Summary
This dataset contains a `pipeline.yaml` which can be used to reproduce the pipeline that generated it in distilabel using the `distilabel` CLI:
```console
distilabel pipeline run --config "https://huggingface.co/datasets/argilla-warehouse/magpie-ultra-v0.2-test-refined-not-verbose/raw/main/pipeline.yaml"
```
or explore the configuration:
```console
distilabel pipeline info --config "https://huggingface.co/datasets/argilla-warehouse/magpie-ultra-v0.2-test-refined-not-verbose/raw/main/pipeline.yaml"
```
## Dataset structure
The examples have the following structure per configuration:
<details><summary> Configuration: default </summary><hr>
```json
{
"instruction": "You are a belonging of vikings. You belong to the mighty tribe led by Harald. What is your name?",
"model_name": "meta-llama/Meta-Llama-3.1-405B-Instruct-FP8",
"response": "I am Gunnar Ironfist, a proud warrior of Harald\u0027s tribe. My name is known throughout our lands for my unyielding ferocity in battle and my unwavering loyalty to our chieftain, Harald. What brings you to our shores, traveler?"
}
```
This subset can be loaded as:
```python
from datasets import load_dataset
ds = load_dataset("argilla-warehouse/magpie-ultra-v0.2-test-refined-not-verbose", "default")
```
Or simply as it follows, since there's only one configuration and is named `default`:
```python
from datasets import load_dataset
ds = load_dataset("argilla-warehouse/magpie-ultra-v0.2-test-refined-not-verbose")
```
</details>
## References
```
@misc{xu2024magpiealignmentdatasynthesis,
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.08464},
}
```
样本规模分类:样本数少于1000
数据集信息:
特征:
- 字段名:instruction,数据类型:string(字符串)
- 字段名:response,数据类型:string(字符串)
- 字段名:model_name,数据类型:string(字符串)
数据划分:
- 划分名称:train,字节大小:19163203,样本数量:10000
下载大小:10667832
数据集总大小:19163203
配置项:
- 配置名称:default,数据文件:
- 划分:train,路径:data/train-*
标签:
- 合成数据集(synthetic)
- distilabel(distilabel)
- 基于AI反馈的强化学习(rlaif)
<p align="left">
<a href="https://github.com/argilla-io/distilabel">
<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="基于distilabel构建" width="200" height="32"/>
</a>
</p>
# 数据集卡片:magpie-ultra-v0.2-test-refined-not-verbose
本数据集由[distilabel](https://distilabel.argilla.io/)构建。
## 数据集概述
本数据集包含一个`pipeline.yaml`文件,可通过`distilabel`命令行工具(CLI)复现生成该数据集的流水线:
console
distilabel pipeline run --config "https://huggingface.co/datasets/argilla-warehouse/magpie-ultra-v0.2-test-refined-not-verbose/raw/main/pipeline.yaml"
或查看该配置详情:
console
distilabel pipeline info --config "https://huggingface.co/datasets/argilla-warehouse/magpie-ultra-v0.2-test-refined-not-verbose/raw/main/pipeline.yaml"
## 数据集结构
每种配置下的样本结构如下:
<details><summary> 配置:default </summary><hr>
json
{
"instruction": "你是维京人的一员,隶属于哈拉尔领导的强大部落,你的名字是什么?",
"model_name": "meta-llama/Meta-Llama-3.1-405B-Instruct-FP8",
"response": "我是贡纳·铁拳,哈拉尔部落中一名自豪的战士。我的名字在我们的土地上广为人知,因我在战斗中无比勇猛,对酋长哈拉尔始终忠诚。旅人,是什么风把你吹到我们的海岸来了?"
}
可通过以下代码加载该子集:
python
from datasets import load_dataset
ds = load_dataset("argilla-warehouse/magpie-ultra-v0.2-test-refined-not-verbose", "default")
由于该数据集仅包含一个名为`default`的配置,也可直接通过如下方式加载:
python
from datasets import load_dataset
ds = load_dataset("argilla-warehouse/magpie-ultra-v0.2-test-refined-not-verbose")
## 参考文献
@misc{xu2024magpiealignmentdatasynthesis,
title={Magpie:通过无提示对齐大语言模型从头生成对齐数据},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.08464},
}
提供机构:
argilla-warehouse



