MMInstruction/VL-RewardBench
收藏Hugging Face2025-05-19 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/MMInstruction/VL-RewardBench
下载链接
链接失效反馈官方服务:
资源简介:
VLRewardBench是一个全面的基准,用于评估视觉-语言生成奖励模型在视觉感知、幻觉检测和推理任务上的表现。该基准包含1250个高质量示例,专门设计用于探测模型的局限性。每个实例由跨三个关键领域的多模态查询组成:真实用户的通用多模态查询、视觉幻觉检测任务以及多模态知识和数学推理。
VLRewardBench is a comprehensive benchmark designed to evaluate vision-language generative reward models (VL-GenRMs) across visual perception, hallucination detection, and reasoning tasks. The benchmark contains 1,250 high-quality examples specifically curated to probe model limitations. Each instance consists of multimodal queries spanning three key domains: general multimodal queries from real users, visual hallucination detection tasks, and multimodal knowledge and mathematical reasoning.
提供机构:
MMInstruction



