Dahoas-rm-static
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/Dahoas/rm-static
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个静态分割的数据集,用于训练奖励模型。它包含了用于监督微调的样本、奖励模型训练的样本以及下游任务验证的样本。该数据集被划分为不同训练目的的子集,用以展示所提出方法的有效性。具体规模大约包括1.5万个样本用于监督微调,3万个样本用于奖励模型训练,8千个样本用于验证,以及5千个样本用于测试。该数据集的任务是针对来自人类反馈的强化学习训练奖励模型。
This is a statically split dataset intended for reward model training. It includes samples for supervised fine-tuning, reward model training, and downstream task validation. The dataset is partitioned into subsets tailored for different training purposes, which serves to validate the effectiveness of the proposed approach. Specifically, it contains approximately 15,000 samples for supervised fine-tuning, 30,000 samples for reward model training, 8,000 samples for validation, and 5,000 samples for testing. The core task of this dataset is to train reward models through reinforcement learning from human feedback (RLHF).
提供机构:
Anthropic
搜集汇总
背景与挑战
背景概述
Dahoas-rm-static是一个用于训练奖励模型的静态分割数据集,特别针对来自人类反馈的强化学习。它包含多个子集,包括约1.5万个样本用于监督微调、3万个用于奖励模型训练、8千个用于验证和5千个用于测试,旨在展示所提出方法的有效性。
以上内容由遇见数据集搜集并总结生成



