opencsg/UltraFeedback-chinese

Name: opencsg/UltraFeedback-chinese
Creator: opencsg
Published: 2025-01-14 11:09:58
License: 暂无描述

Hugging Face2025-01-14 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/opencsg/UltraFeedback-chinese

下载链接

链接失效反馈

官方服务：

资源简介：

UltraFeedback-Chinese 是一个基于 UltraFeedback 数据集构建方法的中文版本，专为训练强大的奖励模型和批评模型而设计。该数据集支持 PPO 和 DPO 两种训练方式，包含对指令遵循、真实性、诚实性和有用性四个方面的细致评分，评分由深度学习模型 deepseek-v3 生成。数据集来源于多个中文资源库，并使用多种模型生成响应。此外，还有一个名为 UltraFeedback-Chinese-Binarized 的数据集变体，专为 DPO 设计。

UltraFeedback-Chinese is a Chinese version developed based on the construction method of the UltraFeedback dataset, designed specifically for training robust reward and critic models. This dataset supports two training methods: PPO (Proximal Policy Optimization) and DPO (Direct Preference Optimization). UltraFeedback-Chinese maintains the same data format as the original UltraFeedback, including detailed assessments of instruction-following, truthfulness, honesty, and helpfulness, with scoring generated by the deep learning model deepseek-v3.

提供机构：

opencsg

5,000+

优质数据集

54 个

任务类型

进入经典数据集