five

andrewsiah/rewarded-Eurus-RM-7b_s126_e251

收藏
Hugging Face2024-05-23 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/andrewsiah/rewarded-Eurus-RM-7b_s126_e251
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: reward_1 dtype: float64 - name: reward_2 dtype: float64 - name: reward_3 dtype: float64 - name: reward_4 dtype: float64 - name: reward_5 dtype: float64 - name: reward_6 dtype: float64 - name: reward_7 dtype: float64 - name: reward_8 dtype: float64 - name: reward_9 dtype: float64 - name: reward_10 dtype: float64 - name: reward_11 dtype: float64 - name: reward_12 dtype: float64 - name: reward_13 dtype: float64 - name: reward_14 dtype: float64 - name: reward_15 dtype: float64 - name: reward_16 dtype: float64 - name: reward_17 dtype: float64 - name: reward_18 dtype: float64 - name: reward_19 dtype: float64 - name: reward_20 dtype: float64 - name: reward_21 dtype: float64 - name: reward_22 dtype: float64 - name: reward_23 dtype: float64 - name: reward_24 dtype: float64 - name: reward_25 dtype: float64 - name: reward_26 dtype: float64 - name: reward_27 dtype: float64 - name: reward_28 dtype: float64 - name: reward_29 dtype: float64 - name: reward_30 dtype: float64 - name: reward_31 dtype: float64 - name: reward_32 dtype: float64 - name: reward_33 dtype: float64 - name: reward_34 dtype: float64 - name: reward_35 dtype: float64 - name: reward_36 dtype: float64 - name: reward_37 dtype: float64 - name: reward_38 dtype: float64 - name: reward_39 dtype: float64 - name: reward_40 dtype: float64 - name: reward_41 dtype: float64 - name: reward_42 dtype: float64 - name: reward_43 dtype: float64 - name: reward_44 dtype: float64 - name: reward_45 dtype: float64 - name: reward_46 dtype: float64 - name: reward_47 dtype: float64 - name: reward_48 dtype: float64 - name: reward_49 dtype: float64 - name: reward_50 dtype: float64 - name: reward_51 dtype: float64 - name: reward_52 dtype: float64 - name: reward_53 dtype: float64 - name: reward_54 dtype: float64 - name: reward_55 dtype: float64 - name: reward_56 dtype: float64 - name: reward_57 dtype: float64 - name: reward_58 dtype: float64 - name: reward_59 dtype: float64 - name: reward_60 dtype: float64 - name: reward_61 dtype: float64 - name: reward_62 dtype: float64 - name: reward_63 dtype: float64 - name: reward_64 dtype: float64 - name: reward_65 dtype: float64 - name: reward_66 dtype: float64 - name: reward_67 dtype: float64 - name: reward_68 dtype: float64 - name: reward_69 dtype: float64 - name: reward_70 dtype: float64 - name: reward_71 dtype: float64 - name: reward_72 dtype: float64 - name: reward_73 dtype: float64 - name: reward_74 dtype: float64 - name: reward_75 dtype: float64 - name: reward_76 dtype: float64 - name: reward_77 dtype: float64 - name: reward_78 dtype: float64 - name: reward_79 dtype: float64 - name: reward_80 dtype: float64 - name: reward_81 dtype: float64 - name: reward_82 dtype: float64 - name: reward_83 dtype: float64 - name: reward_84 dtype: float64 - name: reward_85 dtype: float64 - name: reward_86 dtype: float64 - name: reward_87 dtype: float64 - name: reward_88 dtype: float64 - name: reward_89 dtype: float64 - name: reward_90 dtype: float64 - name: reward_91 dtype: float64 - name: reward_92 dtype: float64 - name: reward_93 dtype: float64 - name: reward_94 dtype: float64 - name: reward_95 dtype: float64 - name: reward_96 dtype: float64 - name: reward_97 dtype: float64 - name: reward_98 dtype: float64 - name: reward_99 dtype: float64 - name: reward_100 dtype: float64 - name: prompt dtype: string - name: subset dtype: string - name: rewardbench_chosen dtype: string - name: rewardbench_chosen_model dtype: string - name: rewardbench_rejected dtype: string - name: rewardbench_rejected_model dtype: string - name: response_1 dtype: string - name: response_1_model dtype: string - name: response_2 dtype: string - name: response_2_model dtype: string - name: response_3 dtype: string - name: response_3_model dtype: string - name: response_4 dtype: string - name: response_4_model dtype: string - name: response_5 dtype: string - name: response_5_model dtype: string - name: response_6 dtype: string - name: response_6_model dtype: string - name: response_7 dtype: string - name: response_7_model dtype: string - name: response_8 dtype: string - name: response_8_model dtype: string - name: response_9 dtype: string - name: response_9_model dtype: string - name: response_10 dtype: string - name: response_10_model dtype: string - name: response_11 dtype: string - name: response_11_model dtype: string - name: response_12 dtype: string - name: response_12_model dtype: string - name: response_13 dtype: string - name: response_13_model dtype: string - name: response_14 dtype: string - name: response_14_model dtype: string - name: response_15 dtype: string - name: response_15_model dtype: string - name: response_16 dtype: string - name: response_16_model dtype: string - name: response_17 dtype: string - name: response_17_model dtype: string - name: response_18 dtype: string - name: response_18_model dtype: string - name: response_19 dtype: string - name: response_19_model dtype: string - name: response_20 dtype: string - name: response_20_model dtype: string - name: response_21 dtype: string - name: response_21_model dtype: string - name: response_22 dtype: string - name: response_22_model dtype: string - name: response_23 dtype: string - name: response_23_model dtype: string - name: response_24 dtype: string - name: response_24_model dtype: string - name: response_25 dtype: string - name: response_25_model dtype: string - name: response_26 dtype: string - name: response_26_model dtype: string - name: response_27 dtype: string - name: response_27_model dtype: string - name: response_28 dtype: string - name: response_28_model dtype: string - name: response_29 dtype: string - name: response_29_model dtype: string - name: response_30 dtype: string - name: response_30_model dtype: string - name: response_31 dtype: string - name: response_31_model dtype: string - name: response_32 dtype: string - name: response_32_model dtype: string - name: response_33 dtype: string - name: response_33_model dtype: string - name: response_34 dtype: string - name: response_34_model dtype: string - name: response_35 dtype: string - name: response_35_model dtype: string - name: response_36 dtype: string - name: response_36_model dtype: string - name: response_37 dtype: string - name: response_37_model dtype: string - name: response_38 dtype: string - name: response_38_model dtype: string - name: response_39 dtype: string - name: response_39_model dtype: string - name: response_40 dtype: string - name: response_40_model dtype: string - name: response_41 dtype: string - name: response_41_model dtype: string - name: response_42 dtype: string - name: response_42_model dtype: string - name: response_43 dtype: string - name: response_43_model dtype: string - name: response_44 dtype: string - name: response_44_model dtype: string - name: response_45 dtype: string - name: response_45_model dtype: string - name: response_46 dtype: string - name: response_46_model dtype: string - name: response_47 dtype: string - name: response_47_model dtype: string - name: response_48 dtype: string - name: response_48_model dtype: string - name: response_49 dtype: string - name: response_49_model dtype: string - name: response_50 dtype: string - name: response_50_model dtype: string - name: response_51 dtype: string - name: response_51_model dtype: string - name: response_52 dtype: string - name: response_52_model dtype: string - name: response_53 dtype: string - name: response_53_model dtype: string - name: response_54 dtype: string - name: response_54_model dtype: string - name: response_55 dtype: string - name: response_55_model dtype: string - name: response_56 dtype: string - name: response_56_model dtype: string - name: response_57 dtype: string - name: response_57_model dtype: string - name: response_58 dtype: string - name: response_58_model dtype: string - name: response_59 dtype: string - name: response_59_model dtype: string - name: response_60 dtype: string - name: response_60_model dtype: string - name: response_61 dtype: string - name: response_61_model dtype: string - name: response_62 dtype: string - name: response_62_model dtype: string - name: response_63 dtype: string - name: response_63_model dtype: string - name: response_64 dtype: string - name: response_64_model dtype: string - name: response_65 dtype: string - name: response_65_model dtype: string - name: response_66 dtype: string - name: response_66_model dtype: string - name: response_67 dtype: string - name: response_67_model dtype: string - name: response_68 dtype: string - name: response_68_model dtype: string - name: response_69 dtype: string - name: response_69_model dtype: string - name: response_70 dtype: string - name: response_70_model dtype: string - name: response_71 dtype: string - name: response_71_model dtype: string - name: response_72 dtype: string - name: response_72_model dtype: string - name: response_73 dtype: string - name: response_73_model dtype: string - name: response_74 dtype: string - name: response_74_model dtype: string - name: response_75 dtype: string - name: response_75_model dtype: string - name: response_76 dtype: string - name: response_76_model dtype: string - name: response_77 dtype: string - name: response_77_model dtype: string - name: response_78 dtype: string - name: response_78_model dtype: string - name: response_79 dtype: string - name: response_79_model dtype: string - name: response_80 dtype: string - name: response_80_model dtype: string - name: response_81 dtype: string - name: response_81_model dtype: string - name: response_82 dtype: string - name: response_82_model dtype: string - name: response_83 dtype: string - name: response_83_model dtype: string - name: response_84 dtype: string - name: response_84_model dtype: string - name: response_85 dtype: string - name: response_85_model dtype: string - name: response_86 dtype: string - name: response_86_model dtype: string - name: response_87 dtype: string - name: response_87_model dtype: string - name: response_88 dtype: string - name: response_88_model dtype: string - name: response_89 dtype: string - name: response_89_model dtype: string - name: response_90 dtype: string - name: response_90_model dtype: string - name: response_91 dtype: string - name: response_91_model dtype: string - name: response_92 dtype: string - name: response_92_model dtype: string - name: response_93 dtype: string - name: response_93_model dtype: string - name: response_94 dtype: string - name: response_94_model dtype: string - name: response_95 dtype: string - name: response_95_model dtype: string - name: response_96 dtype: string - name: response_96_model dtype: string - name: response_97 dtype: string - name: response_97_model dtype: string - name: response_98 dtype: string - name: response_98_model dtype: string - name: response_99 dtype: string - name: response_99_model dtype: string - name: response_100 dtype: string - name: response_100_model dtype: string - name: rformatted_prompt_response_1 dtype: string - name: rformatted_prompt_response_2 dtype: string - name: rformatted_prompt_response_3 dtype: string - name: rformatted_prompt_response_4 dtype: string - name: rformatted_prompt_response_5 dtype: string - name: rformatted_prompt_response_6 dtype: string - name: rformatted_prompt_response_7 dtype: string - name: rformatted_prompt_response_8 dtype: string - name: rformatted_prompt_response_9 dtype: string - name: rformatted_prompt_response_10 dtype: string - name: rformatted_prompt_response_11 dtype: string - name: rformatted_prompt_response_12 dtype: string - name: rformatted_prompt_response_13 dtype: string - name: rformatted_prompt_response_14 dtype: string - name: rformatted_prompt_response_15 dtype: string - name: rformatted_prompt_response_16 dtype: string - name: rformatted_prompt_response_17 dtype: string - name: rformatted_prompt_response_18 dtype: string - name: rformatted_prompt_response_19 dtype: string - name: rformatted_prompt_response_20 dtype: string - name: rformatted_prompt_response_21 dtype: string - name: rformatted_prompt_response_22 dtype: string - name: rformatted_prompt_response_23 dtype: string - name: rformatted_prompt_response_24 dtype: string - name: rformatted_prompt_response_25 dtype: string - name: rformatted_prompt_response_26 dtype: string - name: rformatted_prompt_response_27 dtype: string - name: rformatted_prompt_response_28 dtype: string - name: rformatted_prompt_response_29 dtype: string - name: rformatted_prompt_response_30 dtype: string - name: rformatted_prompt_response_31 dtype: string - name: rformatted_prompt_response_32 dtype: string - name: rformatted_prompt_response_33 dtype: string - name: rformatted_prompt_response_34 dtype: string - name: rformatted_prompt_response_35 dtype: string - name: rformatted_prompt_response_36 dtype: string - name: rformatted_prompt_response_37 dtype: string - name: rformatted_prompt_response_38 dtype: string - name: rformatted_prompt_response_39 dtype: string - name: rformatted_prompt_response_40 dtype: string - name: rformatted_prompt_response_41 dtype: string - name: rformatted_prompt_response_42 dtype: string - name: rformatted_prompt_response_43 dtype: string - name: rformatted_prompt_response_44 dtype: string - name: rformatted_prompt_response_45 dtype: string - name: rformatted_prompt_response_46 dtype: string - name: rformatted_prompt_response_47 dtype: string - name: rformatted_prompt_response_48 dtype: string - name: rformatted_prompt_response_49 dtype: string - name: rformatted_prompt_response_50 dtype: string - name: rformatted_prompt_response_51 dtype: string - name: rformatted_prompt_response_52 dtype: string - name: rformatted_prompt_response_53 dtype: string - name: rformatted_prompt_response_54 dtype: string - name: rformatted_prompt_response_55 dtype: string - name: rformatted_prompt_response_56 dtype: string - name: rformatted_prompt_response_57 dtype: string - name: rformatted_prompt_response_58 dtype: string - name: rformatted_prompt_response_59 dtype: string - name: rformatted_prompt_response_60 dtype: string - name: rformatted_prompt_response_61 dtype: string - name: rformatted_prompt_response_62 dtype: string - name: rformatted_prompt_response_63 dtype: string - name: rformatted_prompt_response_64 dtype: string - name: rformatted_prompt_response_65 dtype: string - name: rformatted_prompt_response_66 dtype: string - name: rformatted_prompt_response_67 dtype: string - name: rformatted_prompt_response_68 dtype: string - name: rformatted_prompt_response_69 dtype: string - name: rformatted_prompt_response_70 dtype: string - name: rformatted_prompt_response_71 dtype: string - name: rformatted_prompt_response_72 dtype: string - name: rformatted_prompt_response_73 dtype: string - name: rformatted_prompt_response_74 dtype: string - name: rformatted_prompt_response_75 dtype: string - name: rformatted_prompt_response_76 dtype: string - name: rformatted_prompt_response_77 dtype: string - name: rformatted_prompt_response_78 dtype: string - name: rformatted_prompt_response_79 dtype: string - name: rformatted_prompt_response_80 dtype: string - name: rformatted_prompt_response_81 dtype: string - name: rformatted_prompt_response_82 dtype: string - name: rformatted_prompt_response_83 dtype: string - name: rformatted_prompt_response_84 dtype: string - name: rformatted_prompt_response_85 dtype: string - name: rformatted_prompt_response_86 dtype: string - name: rformatted_prompt_response_87 dtype: string - name: rformatted_prompt_response_88 dtype: string - name: rformatted_prompt_response_89 dtype: string - name: rformatted_prompt_response_90 dtype: string - name: rformatted_prompt_response_91 dtype: string - name: rformatted_prompt_response_92 dtype: string - name: rformatted_prompt_response_93 dtype: string - name: rformatted_prompt_response_94 dtype: string - name: rformatted_prompt_response_95 dtype: string - name: rformatted_prompt_response_96 dtype: string - name: rformatted_prompt_response_97 dtype: string - name: rformatted_prompt_response_98 dtype: string - name: rformatted_prompt_response_99 dtype: string - name: rformatted_prompt_response_100 dtype: string splits: - name: train num_bytes: 43484992 num_examples: 125 download_size: 23102703 dataset_size: 43484992 configs: - config_name: default data_files: - split: train path: data/train-* ---
提供机构:
andrewsiah
原始信息汇总

数据集概述

数据集特征

数据集包含以下特征:

  • 奖励特征

    • 共有100个奖励特征,命名为reward_1reward_100,每个特征的数据类型均为float64
  • 其他特征

    • prompt: 数据类型为string
    • subset: 数据类型为string
    • rewardbench_chosen: 数据类型为string
    • rewardbench_chosen_model: 数据类型为string
    • rewardbench_rejected: 数据类型为string
    • rewardbench_rejected_model: 数据类型为string
    • response_1response_100及其对应的模型标识response_1_modelresponse_100_model,每个响应及其模型的数据类型均为string
    • rformatted_prompt_response_1rformatted_prompt_response_100,每个响应的数据类型均为string

数据集分割

  • 训练集
    • 名称:train
    • 大小:
      • 字节数:43484992
      • 示例数:125
    • 下载大小:23102703字节
    • 数据集大小:43484992字节
搜集汇总
数据集介绍
main_image_url
构建方式
在强化学习与对齐技术蓬勃发展的背景下,该数据集通过系统化的流程构建而成。其核心源于RewardBench基准,首先收集了涵盖多种场景的提示词(prompt),并针对每个提示生成了由不同模型提供的多达100个候选响应。随后,利用一个经过专门训练的奖励模型(Eurus-RM-7b)对这些响应进行自动化评估,为每一个响应赋予一个量化的奖励分数。这一过程确保了数据生成的规模化与一致性,最终形成了包含提示、多模型响应及其对应奖励分数的结构化数据集。
特点
该数据集最显著的特征在于其高维度的奖励信号标注与丰富的响应多样性。每条数据记录不仅包含原始提示,还囊括了来自不同模型的100个文本响应,并为每个响应配备了精确的浮点数奖励值,构成了一个密集的评分矩阵。这种结构为研究奖励模型的判别粒度、不同模型生成质量的分布比较以及偏好对齐的微观机制提供了前所未有的细粒度数据。数据集同时保留了响应来源的模型标识,便于进行基于模型架构的性能溯源分析。
使用方法
该数据集主要服务于奖励模型训练、强化学习策略优化以及大语言模型对齐的学术研究。使用者可以通过HuggingFace库直接加载,将数据解析为提示、响应列表与奖励分数列表的三元组。在具体应用中,研究人员可利用这些数据训练或微调新的奖励模型,通过监督学习拟合已有的评分模式。亦可将其作为离线强化学习的经验池,用于训练策略模型学习生成高奖励响应。此外,通过分析奖励分数在不同响应和模型间的分布,能够深入评估和比较各类语言模型的生成偏好与质量。
背景与挑战
背景概述
在强化学习与人类反馈对齐的学术前沿,奖励模型作为优化大语言模型行为的关键组件,其性能评估依赖于高质量的数据集。数据集andrewsiah/rewarded-Eurus-RM-7b_s126_e251应运而生,由研究人员Andrew Siah于近期构建,旨在通过系统化的奖励分数标注,为奖励模型的训练与基准测试提供结构化支持。该数据集的核心研究问题聚焦于如何量化评估不同模型对多样化提示的响应质量,从而推动对齐技术向更精准、可解释的方向演进。其影响力体现在为社区提供了可复现的评估框架,加速了安全、可靠人工智能系统的开发进程。
当前挑战
该数据集致力于解决奖励模型评估中的核心挑战,即如何建立统一、客观的度量标准以比较不同模型的输出优劣。构建过程中,首要挑战在于设计覆盖广泛场景的提示与响应对,确保评估的全面性与代表性。其次,为每个响应分配精确的奖励分数需克服主观偏差,要求标注过程具备高度一致性与可验证性。此外,整合多模型输出并维持数据结构的完整性,对数据处理流程提出了严峻的技术要求,任何环节的疏漏都可能影响最终评估的可靠性。
常用场景
经典使用场景
在强化学习与人类反馈对齐的学术探索中,该数据集以其丰富的奖励模型评分与多样化模型响应,为奖励模型的训练与评估提供了经典范例。数据集通过整合多个大语言模型对相同提示的差异化输出,并辅以系统化的奖励分数标注,构建了一个多维度、可量化的对比评估框架。这一框架使得研究者能够深入分析不同模型在安全性、有用性、诚实性等对齐维度上的表现差异,为奖励模型的优化与迭代奠定了数据基础。
实际应用
在实际应用中,该数据集可直接服务于大语言模型的微调与安全对齐工程。开发团队可利用其标注的奖励分数,训练或校准专属的奖励模型,进而指导策略模型生成更符合人类价值观的响应。在内容审核、智能助手对话优化以及高风险领域(如医疗、法律咨询)的AI应用部署前,该数据集提供的多模型响应对比与评分,可作为模型行为安全审计与性能基准测试的重要参考依据。
衍生相关工作
围绕该数据集衍生的经典工作,主要集中在奖励模型架构创新与对齐算法改进方面。研究者利用其细粒度奖励信号,开发了更高效的偏好建模方法,如基于 Bradley-Terry 模型的扩展或直接策略优化算法的改进。同时,该数据集也催生了一系列关于奖励模型泛化能力、奖励黑客(reward hacking)现象检测以及多目标对齐权衡的研究,推动了将人类复杂价值判断编码为可学习奖励函数的技术前沿。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作