five

Unified-Feedback

收藏
arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/llm-blender/Unified-Feedback
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是迄今为止用于训练奖励模型的最大成对反馈数据集之一。在针对大型语言模型(LLMs)进行强化学习人类反馈(RLHF)实验时,该数据集被下采样至2万个样本用于训练,并额外有1千个样本用于评估。具体规模为:训练集包含40万个实例,评估集包含8千个实例。该数据集的任务是训练大型语言模型的奖励模型。

This dataset is one of the largest pairwise feedback datasets to date for training reward models. When conducting Reinforcement Learning from Human Feedback (RLHF) experiments for Large Language Models (LLMs), it was downsampled to 20,000 samples for training and an additional 1,000 samples for evaluation. The actual scale of this dataset is as follows: the training set contains 400,000 instances, while the evaluation set contains 8,000 instances. The core objective of this dataset is to train reward models for large language models.
提供机构:
Hugging Face
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作