five

costadev00/dolly-15k-rlhf-instructgpt-format

收藏
Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/costadev00/dolly-15k-rlhf-instructgpt-format
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为Dolly 15k RLHF Datasets in InstructGPT Format,是基于databricks/databricks-dolly-15k改造的专门用于人类反馈强化学习(RLHF)的数据集。它包含四种配置:1) sft:监督微调示例,包含prompt、completion和text;2) rm_schema:奖励模型架构/提示池,包含空的chosen和rejected字段、reference_response以及ready_for_rm=false标记;3) rm_synthetic:使用Dolly的reference_response作为chosen,采样GPT-2 SFT输出作为rejected的奖励模型代理对;4) ppo:用于PPO/RLHF展开的仅包含prompt的示例。数据集采用InstructGPT风格的纯文本格式,包含训练集(12010条)、验证集(1502条)和测试集(1499条),适用于文本生成、问答和摘要等NLP任务。

This dataset, named Dolly 15k RLHF Datasets in InstructGPT Format, is derived from databricks/databricks-dolly-15k and specifically designed for Reinforcement Learning from Human Feedback (RLHF). It includes four configurations: 1) sft: supervised fine-tuning examples with prompt, completion, and text; 2) rm_schema: reward-modeling schema/prompt pool with empty chosen and rejected fields, reference_response, and ready_for_rm=false; 3) rm_synthetic: reward-modeling proxy pairs where Dolly reference_response is used as chosen and sampled GPT-2 SFT output as rejected; 4) ppo: prompt-only examples for PPO/RLHF rollouts. The dataset uses a plain textual InstructGPT-style format and contains train (12,010), validation (1,502), and test (1,499) splits, suitable for NLP tasks like text generation, question answering, and summarization.
提供机构:
costadev00
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作