aari1995/ultradistil-intel-orca-dpo-de
收藏Hugging Face2024-01-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/aari1995/ultradistil-intel-orca-dpo-de
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- de
tags:
- rlaif
- dpo
- rlhf
- distilabel
- mt
- german
---
(WIP)
Currently this dataset is WIP - there seem to be some translation tasks in the dataset that may not be completly accurate.
In the next days, they will be filtered out. To do so manually, just look for "übersetz" in the columns "input", "chosen" or "rejected"
and exclude them from your training pipeline.
# ULTRA Distilabel Intel Orca DPO (German):
This is the machine-translated German version of Intel's Orca DPO pairs, distilabeled by argilla.
The provided dataset was additionally filtered to only include high-quality examples, as suggested by argilla:
```python
from datasets import load_dataset
# Instead of this:
# dataset = load_dataset("Intel/orca_dpo_pairs", split="train")
# use this:
dataset = load_dataset("argilla/distilabel-intel-orca-dpo-pairs", split="train")
dataset = dataset.filter(
lambda r:
r["status"] != "tie" and
r["chosen_score"] >= 8 and
not r["in_gsm8k_train"]
)
```
The original dataset is around 12k examples, but only filtering to high quality examples allows to reduce the dataset by over 50 % to around 6k.
# Columns:
"system": the system message
"input": is the user prompt
"chosen": the chosen reply to the prompt.
"rejected": the rejected reply to the prompt.
Note: for training with DPOTrainer, you should format system + input as "prompt" with the special tokens and the "assistant" token of your model.
# Acknowledgements:
I would like to thank intel for the initial [dataset](https://huggingface.co/datasets/Intel/orca_dpo_pairs) and argilla for the distilled [dataset](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs).
提供机构:
aari1995
原始信息汇总
数据集概述
数据集名称
ULTRA Distilabel Intel Orca DPO (German)
数据集描述
本数据集是Intel的Orca DPO对子的机器翻译德文版,由argilla进行distilabel处理。数据集经过额外筛选,仅包含高质量示例。
数据集大小
原始数据集约12,000个示例,经过筛选后减少至约6,000个示例。
数据集列信息
- system: 系统消息
- input: 用户提示
- chosen: 对提示的选定回复
- rejected: 对提示的拒绝回复
使用注意事项
- 目前数据集仍在进行中,部分翻译任务可能不完全准确,建议手动筛选并排除含有"übersetz"的示例。
- 使用DPOTrainer训练时,应将系统消息和用户提示格式化为"prompt",并包含模型特定的特殊标记和"assistant"标记。
许可证
Apache-2.0
语言
德语
标签
- rlaif
- dpo
- rlhf
- distilabel
- mt
- german
致谢
感谢Intel提供的初始数据集和argilla提供的distilabel数据集。



