Jianshu001/arabic-daily-pipeline-v4-100
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Jianshu001/arabic-daily-pipeline-v4-100
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个阿拉伯语协议兼容的数据集,使用Gemma-4-31B模型进行自我评估。生成过程包括三个步骤:1) 使用v4协议兼容提示生成(包括助手和用户部分);2) 通过Gemma作为重写器进行清理(移除草稿脚手架、角色标签等);3) 对完整对话进行六维评估(真实性/助手质量/多轮/领域适应性/安全性/完整性),不符合要求的被丢弃。从100个生成的样本中保留了84个,丢弃了16个。丢弃原因包括真实性(13)、多轮(15)和领域适应性(1)。
This dataset is an Arabic protocol-compliant dataset, self-judged using the Gemma-4-31B model. The generation process includes three steps: 1) Generation using v4 protocol-compliant prompts (both assistant and user sides); 2) Cleanup via Gemma-as-rewriter (removes draft scaffolding, role labels, etc.); 3) 6-dim judge on full conversation (realism/assistant_quality/multi_turn/domain_fit/safety/integrity); dirty → drop. From 100 generated samples, 84 were kept and 16 were dropped. Breakdown of drops: realism=13, multi_turn=15, domain_fit=1.
提供机构:
Jianshu001



