KRAFTON/VLM-SubtleBench

Name: KRAFTON/VLM-SubtleBench
Creator: KRAFTON
Published: 2026-03-10 08:36:57
License: 暂无描述

Hugging Face2026-03-10 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/KRAFTON/VLM-SubtleBench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 task_categories: - visual-question-answering - image-to-text language: - en tags: - vlm - benchmark - comparative-reasoning - subtle-difference - image-comparison - multi-image size_categories: - 10K<n<100K configs: - config_name: default data_files: - split: test path: data/test.jsonl - split: val path: data/val.jsonl dataset_info: features: - name: image_1 dtype: image - name: image_2 dtype: image - name: question dtype: string - name: answer dtype: string - name: distractors sequence: string - name: has_caption dtype: bool - name: caption dtype: string - name: category dtype: string - name: domain dtype: string - name: source dtype: string - name: source_id dtype: string - name: raw_folder dtype: string - name: generation_info dtype: string --- # VLM-SubtleBench **VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?** The ability to distinguish subtle differences between visually similar images is essential for diverse domains such as industrial anomaly detection, medical imaging, and aerial surveillance. While comparative reasoning benchmarks for vision-language models (VLMs) have recently emerged, they primarily focus on images with large, salient differences and fail to capture the nuanced reasoning required for real-world applications. VLM-SubtleBench is a benchmark designed to evaluate VLMs on **subtle comparative reasoning** — detecting fine-grained differences between highly similar image pairs that are easy for humans but challenging for state-of-the-art VLMs. Unlike prior benchmarks restricted to natural image datasets, VLM-SubtleBench spans diverse domains including industrial, aerial, and medical imagery. ## Benchmark Summary | | | |---|---| | **Total QA pairs** | 12,923 | | **Difference types** | 10 | | **Image domains** | 6 (Natural, Industrial, Aerial, Synthetic, Medical) | | **Data sources** | 14 | | **Human captions** | 1,200 | | **Splits** | test (11,688) / val (1,235) | | **Task format** | Multiple-choice VQA + Image Difference Captioning | > **Note**: Medical domain images (MIMIC-CXR, 362 pairs) are not included due to licensing restrictions, but their QA entries are included in `qa.json`. See [Medical Data](#medical-data-mimic-cxr) below for instructions on how to obtain the images. ## Medical Data (MIMIC-CXR) The medical domain QA entries (362 attribute comparison pairs from MIMIC-CXR chest X-rays, 664 unique images) are included in `qa.json`, but the corresponding images are not included due to [PhysioNet licensing requirements](https://physionet.org/content/mimic-cxr-jpg/2.1.0/). ### Step 1: Obtain PhysioNet Credentialed Access 1. Create an account at [PhysioNet](https://physionet.org/) 2. Complete the required [CITI training course](https://physionet.org/about/citi-course/) for "Data or Specimens Only Research" 3. Go to [MIMIC-CXR-JPG v2.1.0](https://physionet.org/content/mimic-cxr-jpg/2.1.0/) and sign the data use agreement 4. Wait for your access to be approved ### Step 2: Download Images We provide a script that automatically downloads only the 664 images required by `qa.json` and places them at the expected paths (`images/mimic/...`). ```bash python download_mimic.py --user <physionet-username> --password <physionet-password> ``` The script: - Parses `qa.json` to find all required MIMIC-CXR image paths - Downloads each image from PhysioNet via `wget` - Places them under `images/mimic/` preserving the original directory hierarchy (e.g., `images/mimic/p15/p15592981/s55194630/{hash}.jpg`) - Skips images that already exist, so it is safe to re-run You can also download individual images manually: ```bash wget --user <username> --password <password> \ https://physionet.org/files/mimic-cxr-jpg/2.1.0/files/p15/p15000170/s54385701/3ea0cd5d-b6ef4a9d-bd053deb-a611067c-284e4144.jpg \ -O images/mimic/p15/p15000170/s54385701/3ea0cd5d-b6ef4a9d-bd053deb-a611067c-284e4144.jpg ``` ## Download and Evaluation ### Download ```bash # Using huggingface_hub pip install huggingface_hub python -c "from huggingface_hub import snapshot_download; snapshot_download('KRAFTON/VLM-SubtleBench', repo_type='dataset', local_dir='VLM-SubtleBench')" ``` Or clone directly with Git LFS: ```bash git lfs install git clone https://huggingface.co/datasets/KRAFTON/VLM-SubtleBench ``` ### Evaluation For evaluation code and instructions, please refer to the official GitHub repository: https://github.com/krafton-ai/VLM-SubtleBench ## Citation ```bibtex @inproceedings{kim2026vlmsubtlebench, title={VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?}, author={Kim, Minkyu and Lee, Sangheon and Park, Dongmin}, booktitle={International Conference on Learning Representations (ICLR)}, year={2026}, url={https://arxiv.org/abs/2603.07888} } ```

提供机构：

KRAFTON

5,000+

优质数据集

54 个

任务类型

进入经典数据集