MM-Hallu/HumbleBench

Name: MM-Hallu/HumbleBench
Creator: MM-Hallu
Published: 2026-04-26 06:43:55
License: 暂无描述

Hugging Face2026-04-26 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/MM-Hallu/HumbleBench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - visual-question-answering tags: - hallucination - benchmark - multimodal - humility - epistemic-humility size_categories: - 10K<n<100K configs: - config_name: default data_files: - split: train path: train-*.parquet dataset_info: features: - name: image dtype: image - name: question_id dtype: int64 - name: question dtype: string - name: label dtype: string - name: type dtype: string splits: - name: train num_examples: 22831 --- # HumbleBench HumbleBench is a multimodal hallucination benchmark for evaluating epistemic humility in Multimodal Large Language Models (MLLMs). It tests whether models can recognize when none of the provided answer options are correct -- a behavior reflecting epistemic humility. ## Paper **Measuring Epistemic Humility in Multimodal Large Language Models** ## Dataset Structure - **Total examples**: 22,831 - **Unique images**: 3,582 - **Splits**: train - **Types**: Object, Attribute, Relation ### Fields | Field | Type | Description | |-------|------|-------------| | image | image | The input image | | question_id | int | Unique question identifier | | question | string | Multiple-choice question about the image (options A-E, including "None of the above") | | label | string | Ground truth answer (A/B/C/D/E) | | type | string | Task type: Object, Attribute, or Relation | ### Subsets - **HumbleBench**: Standard evaluation - **HumbleBench-GN**: With Gaussian noise images (set `use_noise_image=True`) - **HumbleBench-E**: "None of the above" only evaluation (set `nota_only=True`) ## Source This dataset was converted from [maifoundations/HumbleBench](https://huggingface.co/datasets/maifoundations/HumbleBench) for the MM-Hallu organization. ## Citation ```bibtex @article{humblebench2025, title={Measuring Epistemic Humility in Multimodal Large Language Models}, author={HumbleBench Team}, journal={arXiv preprint}, year={2025} } ```

提供机构：

MM-Hallu

5,000+

优质数据集

54 个

任务类型

进入经典数据集