pharaouk/ultrafeedback-binarized-preferences-cleaned
收藏Hugging Face2024-04-05 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/pharaouk/ultrafeedback-binarized-preferences-cleaned
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
size_categories:
- 10K<n<100K
task_categories:
- text-generation
pretty_name: UltraFeedback Binarized Preferences Cleaned
dataset_info:
features:
- name: source
dtype: string
- name: prompt
dtype: string
- name: chosen
list:
- name: content
dtype: string
- name: role
dtype: string
- name: chosen-rating
dtype: float64
- name: chosen-model
dtype: string
- name: rejected
list:
- name: content
dtype: string
- name: role
dtype: string
- name: rejected-rating
dtype: float64
- name: rejected-model
dtype: string
splits:
- name: train
num_bytes: 284937773
num_examples: 60917
download_size: 143257393
dataset_size: 284937773
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
tags:
- dpo
- preference
- ultrafeedback
---
# UltraFeedback - Binarized using the Average of Preference Ratings (Cleaned)
This dataset represents a new iteration on top of [`argilla/ultrafeedback-binarized-preferences`](https://huggingface.co/argilla/ultrafeedback-binarized-preferences),
and is the **recommended and preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback**.
Read more about Argilla's approach towards UltraFeedback binarization at [`argilla/ultrafeedback-binarized-preferences/README.md`](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences/blob/main/README.md).
## Differences with `argilla/ultrafeedback-binarized-preferences`
Thanks to the recent issue identified by [AllenAI](https://huggingface.co/allenai) related to the TruthfulQA contamination within the
original UltraFeedback dataset due to some prompts being reused from the TruthfulQA dataset (used for benchmarking
in the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) from HuggingFace H4), we also decided
to follow AllenAI's advice and remove those from the UltraFeedback dataset that we binarized using a completely different approach, which
implied using the average of the preference ratings rather than the critique overall score, as
[`HuggingFaceH4/ultrafeedback_binarized`](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) did.
Besides that, we also saw that not only the rows with the `source=truthful_qa` were contamined (for obvious reasons), but also some
coming from ShareGPT, so we also removed those doing a left join with both subsets from the [`truthful_qa`](https://huggingface.co/datasets/truthful_qa) dataset.
Additionally, we also modified the formatting to be aligned with both [`HuggingFaceH4/ultrafeedback_binarized`](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized),
and [`allenai/ultrafeedback_binarized_cleaned`](https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned) in order to ease
the integration within the [`huggingface/alignment-handbook`](https://github.com/huggingface/alignment-handbook) so that the formatting is standardized.
## Reproduce
<a target="_blank" href="https://colab.research.google.com/drive/1XR9P1St4yTNY0tjti_tIjm-yzP5Bfqc0?usp=sharing">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
To reproduce the data processing combining both our approach and the suggestions from HuggingFace H4 w.r.t. the formatting and the ones from AllenAI to
remove the TruthfulQA contamination, feel free to run the attached Colab Notebook or just view it at [`notebook.ipynb`](./notebook.ipynb) within this repository.
From Argilla we encourage anyone out there to play around, investigate, and experiment with the data, and we firmly believe on open sourcing what we do, as
ourselves, as well as the whole community, benefit a lot from open source and we also want to give back.
## Citation
If you find this dataset is useful in your work, please cite the original UltraFeedback dataset: https://huggingface.co/datasets/openbmb/UltraFeedback
Additionally, you may also want to cite our work with Notus 7B, which lead the curation of the UltraFeedback dataset:
```bibtex
@misc{notus2023,
author = {Alvaro Bartolome and Gabriel Martin and Daniel Vila},
title = {Notus},
year = {2023},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/argilla-io/notus}}
}
```
> Alphabetically ordered by last name due to equal contribution.
提供机构:
pharaouk
原始信息汇总
数据集概述
基本信息
- 语言: 英语
- 许可证: MIT
- 大小分类: 10K<n<100K
- 任务分类: 文本生成
- 美观名称: UltraFeedback Binarized Preferences Cleaned
数据集特征
- source: 字符串类型
- prompt: 字符串类型
- chosen: 列表类型,包含
- content: 字符串类型
- role: 字符串类型
- chosen-rating: 浮点数类型
- chosen-model: 字符串类型
- rejected: 列表类型,包含
- content: 字符串类型
- role: 字符串类型
- rejected-rating: 浮点数类型
- rejected-model: 字符串类型
数据集分割
- 训练集:
- 字节数: 284937773
- 示例数: 60917
- 下载大小: 143257393
- 数据集大小: 284937773
配置
- 默认配置:
- 数据文件:
- 分割: 训练
- 路径: data/train-*
- 数据文件:
标签
- dpo
- preference
- ultrafeedback



