nvidia/Nemotron-RLHF-GenRM-v1
收藏Hugging Face2026-03-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/nvidia/Nemotron-RLHF-GenRM-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license:
- odc-by
task_categories:
- reinforcement-learning
- text-generation
configs:
- config_name: default
data_files:
- split: train
path: data/train.jsonl
---
## Dataset Description:
This dataset is designed to train Generative Reward Models (GenRMs). It leverages reinforcement learning at scale to train accurate and robust GenRMs that generalize better than traditional Bradley-Terry models and reduce the risk of reward hacking.
The dataset is composed of:
* Preference data focused on diverse domains
* A synthetic safety blend
The data follows a "meta-prompt" structure where the model is instructed to act as an expert evaluation judge. For GenRM training, each sample includes:
1. **System/User Prompt**: Instructions for the judge, including evaluation criteria and scoring guidelines.
2. **Conversation Context**: The dialogue history and the latest user query.
3. **Responses to be Scored**: Two candidate assistant responses (Response 1 and Response 2).
4. **Evaluation Plan**: Specific rubrics for the current case (e.g., safety, helpfulness, refusal of harmful requests).
5. **Output Format**: Instructions to output a specific JSON format containing analysis, individual scores, and a ranking.
The GenRM reasons through the strengths and weaknesses of both responses and produces:
* **Individual Helpfulness Score (1-5)**: Higher means more helpful.
* **Ranking Score (1-6)**: 1 denotes response 1 is far superior; 6 denotes response 2 is far superior.
This dataset is ready for commercial use.
## Dataset Owner(s):
NVIDIA Corporation
## Dataset Creation Date:
Created on: 12/01/2025
Last Modified on: 12/01/2025
## License/Terms of Use:
This dataset is licensed under the ODC Attribution License (https://opendatacommons.org/licenses/by/1-0/).
Additional Information: Contains information from allenai/WildChat-1M which is made available under the ODC Attribution License.
## Intended Usage:
This dataset is intended for:
1. Training Generative Reward Models (GenRMs) to reason about response quality and provide granular feedback.
2. Improving model generalization and reducing reward hacking compared to traditional methods.
## Dataset Characterization
**Data Collection Method**
* Hybrid: Human, Synthetic
**Labeling Method**
* Human
## Dataset Format
Modality: Text
Format: JSONL
Structure:
Each line is a JSON object containing a `messages` list. The user message is a structured prompt for a judge.
* **Input**: "You are an expert evaluation judge..." followed by Context, Responses, Plan, and Guidelines.
* **Metadata**: Includes `question_id`, `task_name`, `dataset` source, and ground truth `ranking`.
## Dataset Quantification
| Subset | Samples |
|--------|---------|
| train | 299,517 |
Total Data Storage: ~5GB
## Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal developer teams to ensure this dataset meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/)
提供机构:
nvidia



