ITBill/INFH-6000Q-dpo-preference-dataset

Name: ITBill/INFH-6000Q-dpo-preference-dataset
Creator: ITBill
Published: 2026-04-22 01:16:27
License: 暂无描述

Hugging Face2026-04-22 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/ITBill/INFH-6000Q-dpo-preference-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为INFH-6000Q DPO偏好数据集，主要用于文本生成任务，支持英文和葡萄牙语，数据规模小于1K。数据集包含用于直接偏好优化(DPO)任务的偏好对。数据来源包括基础指令来源GAIR/lima，候选生成器Qwen/Qwen2.5-7B-Instruct和偏好排序器llm-blender/PairRM。构建流程包括从LIMA训练集中采样50个指令，每个指令生成5个候选响应，然后使用PairRM进行排序，保留最高和最低排名的响应作为chosen和rejected。数据集文件为preference_dataset.jsonl，包含50个偏好对。每条记录包含多个字段，如prompt、chosen、rejected等。

This dataset is named INFH-6000Q DPO Preference Dataset and is primarily used for text-generation tasks, supporting English and Portuguese, with a data size of less than 1K. The dataset contains preference pairs used for Direct Preference Optimization (DPO) tasks. Data sources include the base instruction source GAIR/lima, candidate generator Qwen/Qwen2.5-7B-Instruct, and preference ranker llm-blender/PairRM. The construction pipeline involves sampling 50 instructions from the LIMA training split, generating 5 candidate responses per instruction with Qwen2.5-7B-Instruct, and ranking the candidates with PairRM. The highest-ranked response is kept as chosen and the lowest-ranked as rejected. The dataset file is preference_dataset.jsonl, containing 50 preference pairs. Each record includes multiple fields such as prompt, chosen, rejected, etc.

提供机构：

ITBill

5,000+

优质数据集

54 个

任务类型

进入经典数据集