ZJU-REAL/VerifyBench

Name: ZJU-REAL/VerifyBench
Creator: ZJU-REAL
Published: 2026-02-25 02:48:35
License: 暂无描述

Hugging Face2026-02-25 更新2025-10-18 收录

下载链接：

https://hf-mirror.com/datasets/ZJU-REAL/VerifyBench

下载链接

链接失效反馈

官方服务：

资源简介：

VerifyBench是一个专门设计用于评估基于参考的奖励系统的准确性的基准。它通过收集来自现有开放数据集的指令和参考答案，并由多个开源和专有的大型语言模型生成响应。每个实例都通过至少两名人类注释者验证，以确保标签的一致性和可靠性。VerifyBench-Hard是VerifyBench的更难版本，它关注于模型之间高度冲突的案例，为奖励系统准确性提供了更严格的测试。这两个数据集旨在提供对参考基于奖励系统准确性的客观评估，并为推理任务中的强化学习训练提供有价值的见解。

VerifyBench is a benchmark specifically designed to evaluate the accuracy of reference-based reward systems. It consists of instructions and reference answers sourced from existing open datasets, with responses generated by multiple open-source and proprietary LLMs. Each instance in VerifyBench is verified by at least two human annotators to ensure label consistency and reliability. VerifyBench-Hard is a more challenging variant that focuses on cases with high disagreement among models, providing a stricter test for the accuracy of reward systems. These datasets aim to provide an objective evaluation of the accuracy of reference-based reward systems and offer valuable insights for reinforcement learning training in reasoning tasks.

提供机构：

ZJU-REAL

5,000+

优质数据集

54 个

任务类型

进入经典数据集