Yanlin666/assignment4-qwen25-pairrm-dpo-dataset-20260427-122105

Name: Yanlin666/assignment4-qwen25-pairrm-dpo-dataset-20260427-122105
Creator: Yanlin666
Published: 2026-04-27 14:28:26
License: 暂无描述

Hugging Face2026-04-27 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/Yanlin666/assignment4-qwen25-pairrm-dpo-dataset-20260427-122105

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个用于对话模型偏好学习的小型训练数据集，包含50个示例。每个示例包括一个提示（prompt，由角色和内容组成的对话列表）、一个被选中的响应（chosen，同样由角色和内容组成）、一个被拒绝的响应（rejected）、一个指令（instruction）、被选中响应的文本（chosen_text）、被拒绝响应的文本（rejected_text）、所有候选响应的列表（all_candidates）以及基于配对排名模型的排名列表（pairrm_ranks）。数据集设计用于支持模型训练，以区分高质量和低质量响应，适用于强化学习从人类反馈（RLHF）或类似任务。

This dataset is a small training dataset for dialogue model preference learning, containing 50 examples. Each example includes a prompt (a list of dialogues with roles and content), a chosen response (also composed of role and content), a rejected response, an instruction, the text of the chosen response (chosen_text), the text of the rejected response (rejected_text), a list of all candidate responses (all_candidates), and a ranking list based on a pairwise ranking model (pairrm_ranks). The dataset is designed to support model training in distinguishing high-quality from low-quality responses, suitable for tasks such as reinforcement learning from human feedback (RLHF) or similar applications.

提供机构：

Yanlin666

5,000+

优质数据集

54 个

任务类型

进入经典数据集