ITBill/INFH-6000Q-dpo-preference-dataset
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ITBill/INFH-6000Q-dpo-preference-dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为INFH-6000Q DPO偏好数据集,主要用于文本生成任务,支持英文和葡萄牙语,数据规模小于1K。数据集包含用于直接偏好优化(DPO)任务的偏好对。数据来源包括基础指令来源GAIR/lima,候选生成器Qwen/Qwen2.5-7B-Instruct和偏好排序器llm-blender/PairRM。构建流程包括从LIMA训练集中采样50个指令,每个指令生成5个候选响应,然后使用PairRM进行排序,保留最高和最低排名的响应作为chosen和rejected。数据集文件为preference_dataset.jsonl,包含50个偏好对。每条记录包含多个字段,如prompt、chosen、rejected等。
This dataset is named INFH-6000Q DPO Preference Dataset and is primarily used for text-generation tasks, supporting English and Portuguese, with a data size of less than 1K. The dataset contains preference pairs used for Direct Preference Optimization (DPO) tasks. Data sources include the base instruction source GAIR/lima, candidate generator Qwen/Qwen2.5-7B-Instruct, and preference ranker llm-blender/PairRM. The construction pipeline involves sampling 50 instructions from the LIMA training split, generating 5 candidate responses per instruction with Qwen2.5-7B-Instruct, and ranking the candidates with PairRM. The highest-ranked response is kept as chosen and the lowest-ranked as rejected. The dataset file is preference_dataset.jsonl, containing 50 preference pairs. Each record includes multiple fields such as prompt, chosen, rejected, etc.
提供机构:
ITBill



