teamcore/DPO_Pm3B_U0_beta0.25dpo_proEurus_RM_7bg
收藏Hugging Face2025-10-23 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/teamcore/DPO_Pm3B_U0_beta0.25dpo_proEurus_RM_7bg
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了源文本、指令、模型输出、用户对完整性的评分和理由、用户对诚实性的评分和理由、用户对指令遵循性的评分和理由、用户对真实性的评分和理由、批评、自定义系统提示、细粒度分数、模型名称、总体分数、原则、响应、正确答案、错误答案、提示、选择项、拒绝项、选择项的Eurus_RM_7b模型分数、拒绝项的Eurus_RM_7b模型分数、Eurus_RM_7b模型的响应概率、生成的奖励分数、选择的奖励分数、GPT分数和GPT反馈等信息。数据集被默认划分为1000个样本。
The dataset includes source text, instruction, model output, user ratings and reasons for completeness, honesty, instruction adherence, truthfulness, critique, custom system prompt, fine-grained score, model name, overall score, principle, response, correct answers, incorrect answers, prompt, chosen option, rejected option, chosen score for Eurus_RM_7b model, rejected score for Eurus_RM_7b model, response probability of Eurus_RM_7b model, generated reward score, chosen reward score, GPT score, and GPT feedback. The dataset is split by default into 1000 samples.
提供机构:
teamcore



