five

Chinese-dpo-pairs

收藏
魔搭社区2025-11-12 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/Chinese-dpo-pairs
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for Chinese-dpo-pairs Well-curated 10K reference pairs in Chinese. Data are created by GPT-3.5 translation from multiple sources, including: - flan_v2, sharegpt, ultrachat, evol_instruct and false_qa. Sampled from [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned) - open_orca. From [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) - truthy_dpo. From [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1) To ensure quality, I originally translated over 30K samples, then dropped all tranlations with unmatched line number or topic. The dataset is best used together with above English dataset.

# Chinese-dpo-pairs 数据集卡片 该数据集包含10000条经过精心筛选的中文参考配对样本。所有数据均通过GPT-3.5对多源数据进行翻译生成,涵盖以下来源: - flan_v2、sharegpt、ultrachat、evol_instruct及false_qa:样本源自[argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned) - open_orca:源自[Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) - truthy_dpo:源自[jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1) 为保障数据质量,作者最初共翻译了超30000条样本,随后剔除了行数不匹配或主题不符的翻译结果。该数据集建议与上述英文数据集配合使用。
提供机构:
maas
创建时间:
2024-05-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作