Anthropic RLHF dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/ethz-spylab/rlhf-poisoning
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为“Anthropic RLHF”,用于强化学习从人类反馈中学习。它被划分为两个子集:“无害基础”和“有益基础”,在这两个子集中,人类对模型的“无害性”或“有益性”进行评分。每个数据条目包含一个提示以及模型选择和拒绝的生成内容的三元组。此外,该数据集还用于在不同条件下训练和评估模型,包括数据中毒的影响。该数据集的任务是“从人类反馈中进行强化学习”。
This dataset is named "Anthropic RLHF", which is designed for reinforcement learning from human feedback. It is divided into two subsets: "Harmless Base" and "Helpful Base", where humans rate the model's outputs on "harmlessness" or "helpfulness". Each data entry contains a prompt along with a triplet of generated content that the model selected and rejected. Additionally, this dataset is used to train and evaluate models under various conditions, including the impact of data poisoning. The task of this dataset is "Reinforcement Learning from Human Feedback".
提供机构:
Anthropic



