Unified-Feedback
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/llm-blender/Unified-Feedback
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是迄今为止用于训练奖励模型的最大成对反馈数据集之一。在针对大型语言模型(LLMs)进行强化学习人类反馈(RLHF)实验时,该数据集被下采样至2万个样本用于训练,并额外有1千个样本用于评估。具体规模为:训练集包含40万个实例,评估集包含8千个实例。该数据集的任务是训练大型语言模型的奖励模型。
This dataset is one of the largest pairwise feedback datasets to date for training reward models. When conducting Reinforcement Learning from Human Feedback (RLHF) experiments for Large Language Models (LLMs), it was downsampled to 20,000 samples for training and an additional 1,000 samples for evaluation. The actual scale of this dataset is as follows: the training set contains 400,000 instances, while the evaluation set contains 8,000 instances. The core objective of this dataset is to train reward models for large language models.
提供机构:
Hugging Face



