opencsg/UltraFeedback-chinese
收藏Hugging Face2025-01-14 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/opencsg/UltraFeedback-chinese
下载链接
链接失效反馈官方服务:
资源简介:
UltraFeedback-Chinese 是一个基于 UltraFeedback 数据集构建方法的中文版本,专为训练强大的奖励模型和批评模型而设计。该数据集支持 PPO 和 DPO 两种训练方式,包含对指令遵循、真实性、诚实性和有用性四个方面的细致评分,评分由深度学习模型 deepseek-v3 生成。数据集来源于多个中文资源库,并使用多种模型生成响应。此外,还有一个名为 UltraFeedback-Chinese-Binarized 的数据集变体,专为 DPO 设计。
UltraFeedback-Chinese is a Chinese version developed based on the construction method of the UltraFeedback dataset, designed specifically for training robust reward and critic models. This dataset supports two training methods: PPO (Proximal Policy Optimization) and DPO (Direct Preference Optimization). UltraFeedback-Chinese maintains the same data format as the original UltraFeedback, including detailed assessments of instruction-following, truthfulness, honesty, and helpfulness, with scoring generated by the deep learning model deepseek-v3.
提供机构:
opencsg



