Jianshu001/arabic-daily-batch01-cascade-86
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Jianshu001/arabic-daily-batch01-cascade-86
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个阿拉伯语对话数据集,包含86条记录,是从100条记录中删除了14条有问题的记录后得到的。数据集使用了Gemma-4和Gemma-as-rewriter技术进行重写和清理,生成过程包括级联再生、Gemma重写和清理步骤。数据集的模式包括用户消息和助手消息的字段,但删除了所有审计字段。已知的局限性包括部分记录因重写过程中的截断问题被删除,以及非级联助手回合也经过了重写处理。
This dataset is an Arabic dialogue dataset containing 86 records, obtained after dropping 14 problematic records from a 100-record cascade run. The dataset uses Gemma-4 and Gemma-as-rewriter technology for rewriting and cleaning. The generation process includes cascade regeneration, Gemma rewriting, and cleaning steps. The schema of the dataset includes fields for user messages and assistant messages, but all audit fields have been stripped. Known limitations include the dropping of some records due to truncation issues during the rewriting process and the fact that most non-cascade assistant turns were also run through the rewriter.
提供机构:
Jianshu001



