Cartinoe5930/Hermes_preference
收藏Hugging Face2023-10-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Cartinoe5930/Hermes_preference
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
size_categories:
- 100K<n<1M
---
# The Hermes_preference dataset
<!-- Provide a quick summary of the dataset. -->
The **Hermes_preference** dataset is the type of feedback dataset, used for training reward models which is used for RLHF!
In addition, **Hermes_preference** dataset can be also used for DPO!
We collect the preference data from several popular feedback datasets([UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback), [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf), [rlhf-reward-datasets](https://huggingface.co/datasets/yitingxie/rlhf-reward-datasets)) through sampling and preprocessing.
As a result, we could have collected approximately 190K preference data.
To collect high-quality feedback data, we decided to collect feedback data from [UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) & [rlhf-reward-datasets](https://huggingface.co/datasets/yitingxie/rlhf-reward-datasets) which are curated datasets.
In addition, we also collect the data from [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) to accumulate the data that teach the models to output helpful and harmless response.
We hope that **Hermes_preference** dataset provides a promising way to future RLHF & DPO research!
## Dataset Details
<!-- Provide a longer summary of what this dataset is. -->
The **Hermes_preference** dataset is a mixture of several popular preference datasets(UltraFeedback, hh-rlhf, rlhf-reward-datasets) as we mentioned above.
The purpose of this dataset is to make a preference dataset that consists of more varied data.
To accomplish this purpose, we selected the UltraFeedback, hh-rlhf, and rlhf-reward-datasets as the base dataset.
More specifically, we sampled and preprocessed the datasets mentioned above to make Hermes_preference dataset more structural.
- **Curated by:** [More Information Needed]
- **Language(s) (NLP):** en
- **License:** MIT
### Source Data
The Hermes_preference dataset consists of the following datasets.
- [**openbmb/UltraFeedback**](https://huggingface.co/datasets/openbmb/UltraFeedback)
- [**Anthropic/hh-rlhf**](https://huggingface.co/datasets/Anthropic/hh-rlhf)
- [**yitingxie/rlhf-reward-datasets**](https://huggingface.co/datasets/yitingxie/rlhf-reward-datasets)
<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
- **Repository:** [gauss5930/Hermes](https://github.com/gauss5930/Hermes)
- **Model** [Cartinoe5930/Hermes-7b]()
## Dataset Structure
The structure of **Hermes_prference** dataset is as follows:
```
{
"source": The source dataset of data,
"prompt": The instruction of question,
"chosen": Choosed response,
"rejected": Rejected response
}
```
提供机构:
Cartinoe5930
原始信息汇总
Hermes_preference 数据集
概述
Hermes_preference 数据集是一种反馈数据集,用于训练奖励模型,这些模型用于强化学习从人类反馈(RLHF)中学习。此外,Hermes_preference 数据集还可以用于直接偏好优化(DPO)。该数据集通过从多个流行的反馈数据集中采样和预处理数据(包括 UltraFeedback、hh-rlhf 和 rlhf-reward-datasets),总共收集了约 190K 条偏好数据。
数据集详情
Hermes_preference 数据集是多个流行偏好数据集(UltraFeedback、hh-rlhf、rlhf-reward-datasets)的混合体。该数据集的目的是构建一个包含更多样化数据的偏好数据集。为了实现这一目的,选择了 UltraFeedback、hh-rlhf 和 rlhf-reward-datasets 作为基础数据集,并通过采样和预处理使其结构更加合理。
数据集信息
- 语言: 英语
- 许可证: MIT
源数据
Hermes_preference 数据集包含以下数据集:
- openbmb/UltraFeedback
- Anthropic/hh-rlhf
- yitingxie/rlhf-reward-datasets
数据集结构
Hermes_preference 数据集的结构如下: json { "source": "数据来源", "prompt": "问题指令", "chosen": "选择的响应", "rejected": "拒绝的响应" }



