MBZUAI-Paris/Egyptian-DPO-Mixture
收藏Hugging Face2025-07-07 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/MBZUAI-Paris/Egyptian-DPO-Mixture
下载链接
链接失效反馈官方服务:
资源简介:
Nile-Chat离线和在线策略DPO对齐数据集(风格及代码切换修正 + 安全性与指令遵循)支持使用离策略对齐信号增强风格控制、代码切换行为和指令遵守,以及使用在线策略对齐信号提高指令遵循和安全性行为。该数据集针对Nile-Chat-4B-SFT在多样指令上的局限性,重点关注安全性对齐、任务可靠性提升、指令遵循和自我识别。
The Nile-Chat Off- and On-Policy DPO Alignment Dataset (Stylistic & Code-Switching Corrections + Safety & Instruction Following) supports Direct Preference Optimization fine-tuning using off-policy alignment signals to enhance stylistic control, code-switching behavior, and instruction adherence, as well as on-policy alignment signals to improve instruction-following and safety behavior. It targets the limitations observed in Nile-Chat-4B-SFT on diverse instructions, focusing on safety alignment, enhancing task reliability, instruction following, and self-identification.
提供机构:
MBZUAI-Paris



