Dahoas-rm-static

Name: Dahoas-rm-static
Creator: Anthropic
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://huggingface.co/datasets/Dahoas/rm-static

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个静态分割的数据集，用于训练奖励模型。它包含了用于监督微调的样本、奖励模型训练的样本以及下游任务验证的样本。该数据集被划分为不同训练目的的子集，用以展示所提出方法的有效性。具体规模大约包括1.5万个样本用于监督微调，3万个样本用于奖励模型训练，8千个样本用于验证，以及5千个样本用于测试。该数据集的任务是针对来自人类反馈的强化学习训练奖励模型。

This is a statically split dataset intended for reward model training. It includes samples for supervised fine-tuning, reward model training, and downstream task validation. The dataset is partitioned into subsets tailored for different training purposes, which serves to validate the effectiveness of the proposed approach. Specifically, it contains approximately 15,000 samples for supervised fine-tuning, 30,000 samples for reward model training, 8,000 samples for validation, and 5,000 samples for testing. The core task of this dataset is to train reward models through reinforcement learning from human feedback (RLHF).

提供机构：

Anthropic

搜集汇总

背景与挑战

背景概述

Dahoas-rm-static是一个用于训练奖励模型的静态分割数据集，特别针对来自人类反馈的强化学习。它包含多个子集，包括约1.5万个样本用于监督微调、3万个用于奖励模型训练、8千个用于验证和5千个用于测试，旨在展示所提出方法的有效性。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集