Chinese-dpo-pairs
收藏魔搭社区2025-11-12 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/Chinese-dpo-pairs
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for Chinese-dpo-pairs
Well-curated 10K reference pairs in Chinese. Data are created by GPT-3.5 translation from multiple sources, including:
- flan_v2, sharegpt, ultrachat, evol_instruct and false_qa. Sampled from [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)
- open_orca. From [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
- truthy_dpo. From [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1)
To ensure quality, I originally translated over 30K samples, then dropped all tranlations with unmatched line number or topic. The dataset is best used together with above English dataset.
# Chinese-dpo-pairs 数据集卡片
该数据集包含10000条经过精心筛选的中文参考配对样本。所有数据均通过GPT-3.5对多源数据进行翻译生成,涵盖以下来源:
- flan_v2、sharegpt、ultrachat、evol_instruct及false_qa:样本源自[argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)
- open_orca:源自[Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
- truthy_dpo:源自[jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1)
为保障数据质量,作者最初共翻译了超30000条样本,随后剔除了行数不匹配或主题不符的翻译结果。该数据集建议与上述英文数据集配合使用。
提供机构:
maas
创建时间:
2024-05-09



