opencompass/VerifierBench
收藏Hugging Face2025-08-26 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/opencompass/VerifierBench
下载链接
链接失效反馈官方服务:
资源简介:
VerifierBench是一个用于评估大型语言模型验证能力的综合基准数据集,涵盖了数学、知识、科学等多个领域,能够处理多种答案类型,并识别异常或无效的响应。数据集由人类专家标记和检查的多个数据源的问答对组成。
VerifierBench is a comprehensive benchmark for evaluating the verification capabilities of Large Language Models (LLMs), covering multiple domains such as math, knowledge, and science, capable of handling various answer types and identifying abnormal or invalid responses. The dataset consists of question-answer pairs from multiple sources, labeled and checked by human experts.
提供机构:
opencompass



