five

Cartinoe5930/Hermes_preference

收藏
Hugging Face2023-10-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Cartinoe5930/Hermes_preference
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en size_categories: - 100K<n<1M --- # The Hermes_preference dataset <!-- Provide a quick summary of the dataset. --> The **Hermes_preference** dataset is the type of feedback dataset, used for training reward models which is used for RLHF! In addition, **Hermes_preference** dataset can be also used for DPO! We collect the preference data from several popular feedback datasets([UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback), [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf), [rlhf-reward-datasets](https://huggingface.co/datasets/yitingxie/rlhf-reward-datasets)) through sampling and preprocessing. As a result, we could have collected approximately 190K preference data. To collect high-quality feedback data, we decided to collect feedback data from [UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) & [rlhf-reward-datasets](https://huggingface.co/datasets/yitingxie/rlhf-reward-datasets) which are curated datasets. In addition, we also collect the data from [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) to accumulate the data that teach the models to output helpful and harmless response. We hope that **Hermes_preference** dataset provides a promising way to future RLHF & DPO research! ## Dataset Details <!-- Provide a longer summary of what this dataset is. --> The **Hermes_preference** dataset is a mixture of several popular preference datasets(UltraFeedback, hh-rlhf, rlhf-reward-datasets) as we mentioned above. The purpose of this dataset is to make a preference dataset that consists of more varied data. To accomplish this purpose, we selected the UltraFeedback, hh-rlhf, and rlhf-reward-datasets as the base dataset. More specifically, we sampled and preprocessed the datasets mentioned above to make Hermes_preference dataset more structural. - **Curated by:** [More Information Needed] - **Language(s) (NLP):** en - **License:** MIT ### Source Data The Hermes_preference dataset consists of the following datasets. - [**openbmb/UltraFeedback**](https://huggingface.co/datasets/openbmb/UltraFeedback) - [**Anthropic/hh-rlhf**](https://huggingface.co/datasets/Anthropic/hh-rlhf) - [**yitingxie/rlhf-reward-datasets**](https://huggingface.co/datasets/yitingxie/rlhf-reward-datasets) <!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). --> ### Dataset Sources [optional] <!-- Provide the basic links for the dataset. --> - **Repository:** [gauss5930/Hermes](https://github.com/gauss5930/Hermes) - **Model** [Cartinoe5930/Hermes-7b]() ## Dataset Structure The structure of **Hermes_prference** dataset is as follows: ``` { "source": The source dataset of data, "prompt": The instruction of question, "chosen": Choosed response, "rejected": Rejected response } ```
提供机构:
Cartinoe5930
原始信息汇总

Hermes_preference 数据集

概述

Hermes_preference 数据集是一种反馈数据集,用于训练奖励模型,这些模型用于强化学习从人类反馈(RLHF)中学习。此外,Hermes_preference 数据集还可以用于直接偏好优化(DPO)。该数据集通过从多个流行的反馈数据集中采样和预处理数据(包括 UltraFeedback、hh-rlhf 和 rlhf-reward-datasets),总共收集了约 190K 条偏好数据。

数据集详情

Hermes_preference 数据集是多个流行偏好数据集(UltraFeedback、hh-rlhf、rlhf-reward-datasets)的混合体。该数据集的目的是构建一个包含更多样化数据的偏好数据集。为了实现这一目的,选择了 UltraFeedback、hh-rlhf 和 rlhf-reward-datasets 作为基础数据集,并通过采样和预处理使其结构更加合理。

数据集信息

  • 语言: 英语
  • 许可证: MIT

源数据

Hermes_preference 数据集包含以下数据集:

  • openbmb/UltraFeedback
  • Anthropic/hh-rlhf
  • yitingxie/rlhf-reward-datasets

数据集结构

Hermes_preference 数据集的结构如下: json { "source": "数据来源", "prompt": "问题指令", "chosen": "选择的响应", "rejected": "拒绝的响应" }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作