five

Yanlin666/assignment4-qwen25-pairrm-dpo-dataset-20260427-122105

收藏
Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/Yanlin666/assignment4-qwen25-pairrm-dpo-dataset-20260427-122105
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是一个用于对话模型偏好学习的小型训练数据集,包含50个示例。每个示例包括一个提示(prompt,由角色和内容组成的对话列表)、一个被选中的响应(chosen,同样由角色和内容组成)、一个被拒绝的响应(rejected)、一个指令(instruction)、被选中响应的文本(chosen_text)、被拒绝响应的文本(rejected_text)、所有候选响应的列表(all_candidates)以及基于配对排名模型的排名列表(pairrm_ranks)。数据集设计用于支持模型训练,以区分高质量和低质量响应,适用于强化学习从人类反馈(RLHF)或类似任务。

This dataset is a small training dataset for dialogue model preference learning, containing 50 examples. Each example includes a prompt (a list of dialogues with roles and content), a chosen response (also composed of role and content), a rejected response, an instruction, the text of the chosen response (chosen_text), the text of the rejected response (rejected_text), a list of all candidate responses (all_candidates), and a ranking list based on a pairwise ranking model (pairrm_ranks). The dataset is designed to support model training in distinguishing high-quality from low-quality responses, suitable for tasks such as reinforcement learning from human feedback (RLHF) or similar applications.
提供机构:
Yanlin666
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作