YQ12/smoketest-assignment4-pairrm-qwen25-preference
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/YQ12/smoketest-assignment4-pairrm-qwen25-preference
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含用于偏好学习或强化学习的对话数据,每个示例包括一个提示(prompt)、一个被选中的回答(chosen)、一个被拒绝的回答(rejected)、一个指令(instruction)、所有回答列表(all_responses)、配对排名(pairrm_ranks)以及选中和拒绝回答的索引。数据集结构支持模型训练以区分回答质量,例如用于对齐或排名任务,训练集包含3个示例,数据量较小。
This dataset contains dialogue data for preference learning or reinforcement learning, with each example including a prompt, a chosen response, a rejected response, an instruction, a list of all responses, pairrm_ranks, and indices for chosen and rejected responses. The structure supports model training to distinguish response quality, such as for alignment or ranking tasks, with the training split containing 3 examples and a small data volume.
提供机构:
YQ12



