MM-Hallu/CP-Bench
收藏Hugging Face2026-04-30 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/MM-Hallu/CP-Bench
下载链接
链接失效反馈官方服务:
资源简介:
CP-Bench是一个用于评估视觉语言模型在检测幻觉方面能力的反事实预设基准数据集。包含1,500个视觉问答对,涉及1,180张独特图像,测试模型是否能正确识别问题中的反事实预设。数据集包含两种问题类型:反事实预设问题(cpq)和真实预设问题(tpq),每种类型各750个问题。数据集的字段包括图像、图像名称、查询(自然语言问题)和标签(问题类型)。评估指标包括F1分数、准确率、精确率和召回率,使用GPT-4o作为评判标准。
Counterfactual Presupposition Benchmark for evaluating hallucination detection in VLMs. 1,500 VQA pairs across 1,180 unique images, testing whether models correctly identify counterfactual presuppositions in questions. The dataset includes two types of questions: counterfactual presupposition questions (cpq) and true presupposition questions (tpq), with 750 each. Fields include image, image name, query (natural language question), and tag (question type). Evaluation metrics include F1-Score, Accuracy, Precision, and Recall, using GPT-4o as Judge.
提供机构:
MM-Hallu



