aladinDJ/ultramix-DPO-annotated
收藏Hugging Face2025-11-14 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/aladinDJ/ultramix-DPO-annotated
下载链接
链接失效反馈官方服务:
资源简介:
UltraMix是一个从五个开源DPO语料库(TuluDPO、ORPO、UltraFeedback、HelpSteer和Code-Preference-Pairs)中精选和优化的高质量偏好优化数据集。它通过奖励驱动的筛选管道和Magpie注释框架去除了噪声、低奖励或冗余的偏好对,同时保持了任务平衡。这个数据集旨在用于直接偏好优化(DPO)的偏好训练,是一个质量、奖励和任务感知优化的混合体。
UltraMix is a high-quality preference optimization dataset curated from five open-source DPO corpora: TuluDPO, ORPO, UltraFeedback, HelpSteer, and Code-Preference-Pairs. It is a reward-driven, quality-filtered, and task-balanced mixture designed for DPO preference training, featuring quality-, reward-, and task-aware optimization.
提供机构:
aladinDJ



