mtybilly/OmniMedVQA-V2

Name: mtybilly/OmniMedVQA-V2
Creator: mtybilly
Published: 2026-04-18 20:34:07
License: 暂无描述

Hugging Face2026-04-18 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/mtybilly/OmniMedVQA-V2

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 task_categories: - visual-question-answering modality: - image language: - en tags: - medical - vqa - radiology - pathology - ophthalmology - dermatology pretty_name: OmniMedVQA (mtybilly v2) configs: - config_name: mod-ct data_files: - split: train path: mod-ct/train-*.parquet - split: test path: mod-ct/test-*.parquet - config_name: mod-mri data_files: - split: train path: mod-mri/train-*.parquet - split: test path: mod-mri/test-*.parquet - config_name: mod-xray data_files: - split: train path: mod-xray/train-*.parquet - split: test path: mod-xray/test-*.parquet - config_name: mod-fundus data_files: - split: train path: mod-fundus/train-*.parquet - split: test path: mod-fundus/test-*.parquet - config_name: mod-derm data_files: - split: train path: mod-derm/train-*.parquet - split: test path: mod-derm/test-*.parquet - config_name: mod-micro data_files: - split: train path: mod-micro/train-*.parquet - split: test path: mod-micro/test-*.parquet - config_name: mod-oct data_files: - split: train path: mod-oct/train-*.parquet - split: test path: mod-oct/test-*.parquet - config_name: mod-us data_files: - split: train path: mod-us/train-*.parquet - split: test path: mod-us/test-*.parquet - config_name: qt-ai data_files: - split: train path: qt-ai/train-*.parquet - split: test path: qt-ai/test-*.parquet - config_name: qt-dd data_files: - split: train path: qt-dd/train-*.parquet - split: test path: qt-dd/test-*.parquet - config_name: qt-lg data_files: - split: train path: qt-lg/train-*.parquet - split: test path: qt-lg/test-*.parquet - config_name: qt-mr data_files: - split: train path: qt-mr/train-*.parquet - split: test path: qt-mr/test-*.parquet - config_name: qt-oba data_files: - split: train path: qt-oba/train-*.parquet - split: test path: qt-oba/test-*.parquet --- # OmniMedVQA — Granular Subsets (v2) OmniMedVQA is a large-scale medical visual question answering benchmark covering 12 imaging modalities and 5 clinical question types. This v2 release replaces the coarse `modality` and `question_type` configs with **13 granular named configs** (`mod-*` and `qt-*`) whose train/test boundaries follow [Med-R1](https://arxiv.org/abs/2503.13939)'s partitioning. Images are sourced from the canonical [foreverbeliever/OmniMedVQA](https://huggingface.co/datasets/foreverbeliever/OmniMedVQA) release; restricted-access images (those not distributed in the open-access pool) are present as metadata-only rows with `image=None`. --- ## Abbreviation Key | Config | Full name | Category | |--------|-----------|----------| | `mod-ct` | CT (Computed Tomography) | Modality | | `mod-mri` | MR (Magnetic Resonance Imaging) | Modality | | `mod-xray` | X-Ray | Modality | | `mod-fundus` | Fundus Photography | Modality | | `mod-derm` | Dermoscopy | Modality | | `mod-micro` | Microscopy Images | Modality | | `mod-oct` | OCT (Optical Coherence Tomography) | Modality | | `mod-us` | Ultrasound | Modality | | `qt-ai` | Anatomy Identification | Question Type | | `qt-dd` | Disease Diagnosis | Question Type | | `qt-lg` | Lesion Grading | Question Type | | `qt-mr` | Modality Recognition | Question Type | | `qt-oba` | Other Biological Attributes | Question Type | --- ## Per-Subset Row Counts > `restricted` = rows where `image=None` (restricted-access source dataset; metadata present). | Config | Train | Test | Restricted (train+test) | |--------|------:|-----:|------------------------:| | mod-ct | 12,567 | 3,241 | 0 | | mod-mri | 25,507 | 6,370 | 0 | | mod-xray | 6,301 | 1,615 | 0 | | mod-fundus | 4,300 | 1,098 | 0 | | mod-derm | 5,373 | 1,306 | 0 | | mod-micro | 4,570 | 1,110 | 0 | | mod-oct | 3,798 | 848 | 0 | | mod-us | 8,917 | 2,074 | 0 | | qt-ai | 13,119 | 3,329 | 0 | | qt-dd | 44,329 | 11,058 | 0 | | qt-lg | 1,662 | 436 | 0 | | qt-mr | 9,173 | 2,392 | 0 | | qt-oba | 2,794 | 704 | 0 | | **Total** | **71,333** | **17,919** | **0** | All Med-R1 rows map exclusively to open-access source datasets; no restricted-access rows (`image=None`) appear in this release. **Important:** `mod-*` and `qt-*` subsets are **not** orthogonal views of the same pool. Med-R1 partitioned them independently with separate train/test boundaries. A given (image, question) pair may appear in a `mod-*` train split and a `qt-*` test split (or vice versa). Do not assume the union of all `mod-*` rows equals the union of all `qt-*` rows. --- ## Schema | Field | Type | Description | |-------|------|-------------| | `image` | `Image` | Full-resolution image bytes. `None` for restricted-access rows. | | `problem` | `string` | Med-R1 inline-options format: `"<question> A)opt, B)opt, C)opt, D)opt"` | | `solution` | `string` | Med-R1 answer format: `"<answer> X </answer>"` (spaces inside tags) | | `question` | `string` | Plain question text with options stripped | | `choice_a` | `string` | Text of option A | | `choice_b` | `string` | Text of option B | | `choice_c` | `string` | Text of option C; `""` for binary (Yes/No) rows | | `choice_d` | `string` | Text of option D; `""` if only 2–3 options present (~7.8% of rows) | | `answer_letter` | `string` | Single char `"A"` / `"B"` / `"C"` / `"D"` | | `answer_text` | `string` | Full text of the correct choice | | `modality` | `string` | Full modality name. Fixed for `mod-*`; best-effort from foreverbeliever QA for `qt-*`; `""` if not found. | | `question_type` | `string` | Full question-type name. Fixed for `qt-*`; best-effort from foreverbeliever QA for `mod-*`; `""` if not found. | --- ## Example Row ```json { "image": "<PIL.Image 480x353 RGBA>", "problem": "What modality is used to capture this image? A)CT, B)X-ray, C)Nuclear medicine scan, D)PET scan", "solution": "<answer> A </answer>", "question": "What modality is used to capture this image?", "choice_a": "CT", "choice_b": "X-ray", "choice_c": "Nuclear medicine scan", "choice_d": "PET scan", "answer_letter": "A", "answer_text": "CT", "modality": "CT(Computed Tomography)", "question_type": "Other Biological Attributes" } ``` --- ## Usage ```python from datasets import load_dataset # Load CT modality subset — train split ds = load_dataset("mtybilly/OmniMedVQA-V2", "mod-ct", split="train") print(ds[0]) # Load Disease Diagnosis subset — test split ds = load_dataset("mtybilly/OmniMedVQA-V2", "qt-dd", split="test") # Iterate all 13 configs configs = [ "mod-ct", "mod-mri", "mod-xray", "mod-fundus", "mod-derm", "mod-micro", "mod-oct", "mod-us", "qt-ai", "qt-dd", "qt-lg", "qt-mr", "qt-oba", ] for cfg in configs: ds = load_dataset("mtybilly/OmniMedVQA-V2", cfg) print(cfg, ds) # Filter out restricted-access rows (image=None) ds_open = ds.filter(lambda x: x["image"] is not None) ``` For training with `max_pixels=401408` (project standard): ```python from datasets import load_dataset from PIL import Image as PILImage def resize_if_needed(example, max_pixels=401408): img = example["image"] if img is None: return example w, h = img.size if w * h > max_pixels: scale = (max_pixels / (w * h)) ** 0.5 img = img.resize((int(w * scale), int(h * scale)), PILImage.LANCZOS) example["image"] = img return example ``` --- ## Source & Derivation This dataset is derived from three sources: 1. **OmniMedVQA** (canonical benchmark): Hu, Yutao et al. "OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM." *NeurIPS 2024*. arXiv:2402.09181. HF upstream: [foreverbeliever/OmniMedVQA](https://huggingface.co/datasets/foreverbeliever/OmniMedVQA) 2. **Med-R1** (train/test partitioning): Lai, Yuxiang et al. "Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models." arXiv:2503.13939. GitHub: [Yuxiang-Lai117/Med-R1](https://github.com/Yuxiang-Lai117/Med-R1) — `Splits/{modality,question_type}/{train,test}/*.json` 3. **This release** (mtybilly/OmniMedVQA-V2): Restructured from the above two sources. The train/test split boundaries for each of the 13 configs are taken verbatim from Med-R1's JSON files. Image bytes are sourced from foreverbeliever/OmniMedVQA's open-access pool. The `problem`, `solution`, `modality`, and `question_type` fields use the Med-R1 text and the foreverbeliever QA JSON metadata respectively. --- ## License **CC BY-NC 4.0** — Non-commercial use only. This dataset inherits the license of its upstream sources: - foreverbeliever/OmniMedVQA is released under CC BY-NC 4.0 - OmniMedVQA is built from multiple publicly available medical datasets, each with their own licenses (see the original OmniMedVQA paper and repository for per-dataset attribution) **Non-commercial use only.** Do not use this dataset for commercial purposes. --- ## Changelog | Version | Date | Notes | |---------|------|-------| | v1 | — | Original `mtybilly/OmniMedVQA` with two configs: `modality`, `question_type` | | v2 | 2026-04-16 | Replaced `modality`/`question_type` configs with 13 granular `mod-*`/`qt-*` subsets following Med-R1's partitioning. Added `question`, `choice_a/b/c/d`, `answer_letter`, `answer_text` fields. Image bytes from foreverbeliever/OmniMedVQA open-access pool. | --- ## Known Limitations 1. **Restricted-access images:** All 88,995 rows (71,333 train + 17,662 test across mod-* subsets; 71,077 train + 17,919 test across qt-* subsets) map to open-access source datasets — no `image=None` rows in this release. (foreverbeliever/OmniMedVQA contains ~35k restricted-access QA records, but Med-R1's partitioning happened to only draw from the open-access pool.) 2. **Orthogonal label coverage:** The `modality` field in `qt-*` subsets and the `question_type` field in `mod-*` subsets are populated on a best-effort basis from foreverbeliever's QA JSONs. If a row's image path does not appear in the open-access or restricted-access QA pool, the field is `""`. 3. **Independent partition boundaries:** `mod-*` and `qt-*` subsets do not cover the same rows. A row from `mod-ct/train` may appear in `qt-mr/test` — this is by design, reflecting Med-R1's independent partitioning. 4. **Images not resized:** Images are stored at original resolution. Apply `max_pixels=401408` yourself at training/eval time (project standard). 5. **Binary and 3-option rows:** ~7.8% of rows have `choice_d=""` (3-option) and a small fraction have `choice_c=choice_d=""` (binary Yes/No questions). These are valid rows from the original OmniMedVQA benchmark.

提供机构：

mtybilly

5,000+

优质数据集

54 个

任务类型

进入经典数据集