RewardBench Dataset

Name: RewardBench Dataset
Creator: Papers with Code
License: 暂无描述

paperswithcode.com2025-01-15 收录

下载链接：

https://paperswithcode.com/dataset/rewardbench

下载链接

链接失效反馈

官方服务：

资源简介：

RewardBench is a benchmark designed to evaluate the capabilities and safety of reward models, including those trained with Direct Preference Optimization (DPO). It serves as the first evaluation tool for reward models and provides valuable insights into their performance and reliability¹. Here are the key components of RewardBench: Common Inference Code: The repository includes common inference code for various reward models, such as Starling, PairRM, OpenAssistant, and more. These models can be evaluated using the provided tools¹. Dataset and Evaluation: The RewardBench dataset consists of prompt-win-lose trios spanning chat, reasoning, and safety scenarios. It allows benchmarking reward models on challenging, structured, and out-of-distribution queries. The goal is to enhance scientific understanding of reward models and their behavior². Scripts for Evaluation: scripts/run_rm.py: Used to evaluate individual reward models. scripts/run_dpo.py: Used to evaluate direct preference optimization (DPO) models. scripts/train_rm.py: A basic reward model training script built on TRL (Transformer Reinforcement Learning)¹. Installation and Usage: Install PyTorch on your system. Install the required dependencies using pip install -e .. Set the environment variable HF_TOKEN with your token. To contribute your model to the leaderboard, open an issue on HuggingFace with the model name. For local model evaluation, follow the instructions in the repository¹. Remember that RewardBench provides a standardized way to assess reward models, ensuring transparency and comparability across different approaches. 🌟🔍 (1) GitHub - allenai/reward-bench: RewardBench: the first evaluation tool .... https://github.com/allenai/reward-bench. (2) RewardBench: Evaluating Reward Models for Language Modeling. https://arxiv.org/abs/2403.13787. (3) RewardBench: Evaluating Reward Models for Language Modeling. https://paperswithcode.com/paper/rewardbench-evaluating-reward-models-for.

RewardBench乃一项旨在评估奖励模型能力与安全性的基准测试，涵盖了通过直接偏好优化（DPO）训练的模型。该工具作为奖励模型的首次评估工具，为奖励模型的性能与可靠性提供了宝贵的见解¹。以下是RewardBench的关键组成部分：通用推理代码：仓库中包含了适用于多种奖励模型的通用推理代码，例如Starling、PairRM、OpenAssistant等。这些模型可通过提供的工具进行评估¹。数据集与评估：RewardBench数据集包含涵盖聊天、推理和安全场景的提示-赢-输三元组。它允许在具有挑战性、结构化和分布外的查询上对奖励模型进行基准测试。目标在于深化对奖励模型及其行为的科学理解²。评估脚本： scripts/run_rm.py：用于评估单个奖励模型。 scripts/run_dpo.py：用于评估直接偏好优化（DPO）模型。 scripts/train_rm.py：基于Transformer强化学习（TRL）构建的基本奖励模型训练脚本¹。安装与使用：在您的系统上安装PyTorch。使用pip install -e ..安装所需依赖。设置环境变量HF_TOKEN为您的令牌。若欲将您的模型贡献至排行榜，请在HuggingFace上提交关于模型名称的问题。对于本地模型评估，请遵循仓库中的说明¹。请记住，RewardBench提供了一个标准化的评估奖励模型的方法，确保了不同方法之间的透明度和可比性。🌟🔍 (1) GitHub - allenai/reward-bench: RewardBench：首个奖励模型评估工具 .... https://github.com/allenai/reward-bench. (2) RewardBench：评估语言建模的奖励模型。https://arxiv.org/abs/2403.13787. (3) RewardBench：评估语言建模的奖励模型。https://paperswithcode.com/paper/rewardbench-evaluating-reward-models-for.

提供机构：

Papers with Code

5,000+

优质数据集

54 个

任务类型

进入经典数据集