five

MM-Hallu/HumbleBench

收藏
Hugging Face2026-04-26 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/MM-Hallu/HumbleBench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - visual-question-answering tags: - hallucination - benchmark - multimodal - humility - epistemic-humility size_categories: - 10K<n<100K configs: - config_name: default data_files: - split: train path: train-*.parquet dataset_info: features: - name: image dtype: image - name: question_id dtype: int64 - name: question dtype: string - name: label dtype: string - name: type dtype: string splits: - name: train num_examples: 22831 --- # HumbleBench HumbleBench is a multimodal hallucination benchmark for evaluating epistemic humility in Multimodal Large Language Models (MLLMs). It tests whether models can recognize when none of the provided answer options are correct -- a behavior reflecting epistemic humility. ## Paper **Measuring Epistemic Humility in Multimodal Large Language Models** ## Dataset Structure - **Total examples**: 22,831 - **Unique images**: 3,582 - **Splits**: train - **Types**: Object, Attribute, Relation ### Fields | Field | Type | Description | |-------|------|-------------| | image | image | The input image | | question_id | int | Unique question identifier | | question | string | Multiple-choice question about the image (options A-E, including "None of the above") | | label | string | Ground truth answer (A/B/C/D/E) | | type | string | Task type: Object, Attribute, or Relation | ### Subsets - **HumbleBench**: Standard evaluation - **HumbleBench-GN**: With Gaussian noise images (set `use_noise_image=True`) - **HumbleBench-E**: "None of the above" only evaluation (set `nota_only=True`) ## Source This dataset was converted from [maifoundations/HumbleBench](https://huggingface.co/datasets/maifoundations/HumbleBench) for the MM-Hallu organization. ## Citation ```bibtex @article{humblebench2025, title={Measuring Epistemic Humility in Multimodal Large Language Models}, author={HumbleBench Team}, journal={arXiv preprint}, year={2025} } ```
提供机构:
MM-Hallu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作