five

ZJU-REAL/VerifyBench

收藏
Hugging Face2026-02-25 更新2025-10-18 收录
下载链接:
https://hf-mirror.com/datasets/ZJU-REAL/VerifyBench
下载链接
链接失效反馈
官方服务:
资源简介:
VerifyBench是一个专门设计用于评估基于参考的奖励系统的准确性的基准。它通过收集来自现有开放数据集的指令和参考答案,并由多个开源和专有的大型语言模型生成响应。每个实例都通过至少两名人类注释者验证,以确保标签的一致性和可靠性。VerifyBench-Hard是VerifyBench的更难版本,它关注于模型之间高度冲突的案例,为奖励系统准确性提供了更严格的测试。这两个数据集旨在提供对参考基于奖励系统准确性的客观评估,并为推理任务中的强化学习训练提供有价值的见解。

VerifyBench is a benchmark specifically designed to evaluate the accuracy of reference-based reward systems. It consists of instructions and reference answers sourced from existing open datasets, with responses generated by multiple open-source and proprietary LLMs. Each instance in VerifyBench is verified by at least two human annotators to ensure label consistency and reliability. VerifyBench-Hard is a more challenging variant that focuses on cases with high disagreement among models, providing a stricter test for the accuracy of reward systems. These datasets aim to provide an objective evaluation of the accuracy of reference-based reward systems and offer valuable insights for reinforcement learning training in reasoning tasks.
提供机构:
ZJU-REAL
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作