five

MBZUAI-Paris/Egyptian-DPO-Mixture

收藏
Hugging Face2025-07-07 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/MBZUAI-Paris/Egyptian-DPO-Mixture
下载链接
链接失效反馈
官方服务:
资源简介:
Nile-Chat离线和在线策略DPO对齐数据集(风格及代码切换修正 + 安全性与指令遵循)支持使用离策略对齐信号增强风格控制、代码切换行为和指令遵守,以及使用在线策略对齐信号提高指令遵循和安全性行为。该数据集针对Nile-Chat-4B-SFT在多样指令上的局限性,重点关注安全性对齐、任务可靠性提升、指令遵循和自我识别。

The Nile-Chat Off- and On-Policy DPO Alignment Dataset (Stylistic & Code-Switching Corrections + Safety & Instruction Following) supports Direct Preference Optimization fine-tuning using off-policy alignment signals to enhance stylistic control, code-switching behavior, and instruction adherence, as well as on-policy alignment signals to improve instruction-following and safety behavior. It targets the limitations observed in Nile-Chat-4B-SFT on diverse instructions, focusing on safety alignment, enhancing task reliability, instruction following, and self-identification.
提供机构:
MBZUAI-Paris
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作