mtybilly/OmniMedVQA-V2
收藏Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/mtybilly/OmniMedVQA-V2
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
task_categories:
- visual-question-answering
modality:
- image
language:
- en
tags:
- medical
- vqa
- radiology
- pathology
- ophthalmology
- dermatology
pretty_name: OmniMedVQA (mtybilly v2)
configs:
- config_name: mod-ct
data_files:
- split: train
path: mod-ct/train-*.parquet
- split: test
path: mod-ct/test-*.parquet
- config_name: mod-mri
data_files:
- split: train
path: mod-mri/train-*.parquet
- split: test
path: mod-mri/test-*.parquet
- config_name: mod-xray
data_files:
- split: train
path: mod-xray/train-*.parquet
- split: test
path: mod-xray/test-*.parquet
- config_name: mod-fundus
data_files:
- split: train
path: mod-fundus/train-*.parquet
- split: test
path: mod-fundus/test-*.parquet
- config_name: mod-derm
data_files:
- split: train
path: mod-derm/train-*.parquet
- split: test
path: mod-derm/test-*.parquet
- config_name: mod-micro
data_files:
- split: train
path: mod-micro/train-*.parquet
- split: test
path: mod-micro/test-*.parquet
- config_name: mod-oct
data_files:
- split: train
path: mod-oct/train-*.parquet
- split: test
path: mod-oct/test-*.parquet
- config_name: mod-us
data_files:
- split: train
path: mod-us/train-*.parquet
- split: test
path: mod-us/test-*.parquet
- config_name: qt-ai
data_files:
- split: train
path: qt-ai/train-*.parquet
- split: test
path: qt-ai/test-*.parquet
- config_name: qt-dd
data_files:
- split: train
path: qt-dd/train-*.parquet
- split: test
path: qt-dd/test-*.parquet
- config_name: qt-lg
data_files:
- split: train
path: qt-lg/train-*.parquet
- split: test
path: qt-lg/test-*.parquet
- config_name: qt-mr
data_files:
- split: train
path: qt-mr/train-*.parquet
- split: test
path: qt-mr/test-*.parquet
- config_name: qt-oba
data_files:
- split: train
path: qt-oba/train-*.parquet
- split: test
path: qt-oba/test-*.parquet
---
# OmniMedVQA — Granular Subsets (v2)
OmniMedVQA is a large-scale medical visual question answering benchmark covering 12 imaging modalities and 5 clinical question types. This v2 release replaces the coarse `modality` and `question_type` configs with **13 granular named configs** (`mod-*` and `qt-*`) whose train/test boundaries follow [Med-R1](https://arxiv.org/abs/2503.13939)'s partitioning. Images are sourced from the canonical [foreverbeliever/OmniMedVQA](https://huggingface.co/datasets/foreverbeliever/OmniMedVQA) release; restricted-access images (those not distributed in the open-access pool) are present as metadata-only rows with `image=None`.
---
## Abbreviation Key
| Config | Full name | Category |
|--------|-----------|----------|
| `mod-ct` | CT (Computed Tomography) | Modality |
| `mod-mri` | MR (Magnetic Resonance Imaging) | Modality |
| `mod-xray` | X-Ray | Modality |
| `mod-fundus` | Fundus Photography | Modality |
| `mod-derm` | Dermoscopy | Modality |
| `mod-micro` | Microscopy Images | Modality |
| `mod-oct` | OCT (Optical Coherence Tomography) | Modality |
| `mod-us` | Ultrasound | Modality |
| `qt-ai` | Anatomy Identification | Question Type |
| `qt-dd` | Disease Diagnosis | Question Type |
| `qt-lg` | Lesion Grading | Question Type |
| `qt-mr` | Modality Recognition | Question Type |
| `qt-oba` | Other Biological Attributes | Question Type |
---
## Per-Subset Row Counts
> `restricted` = rows where `image=None` (restricted-access source dataset; metadata present).
| Config | Train | Test | Restricted (train+test) |
|--------|------:|-----:|------------------------:|
| mod-ct | 12,567 | 3,241 | 0 |
| mod-mri | 25,507 | 6,370 | 0 |
| mod-xray | 6,301 | 1,615 | 0 |
| mod-fundus | 4,300 | 1,098 | 0 |
| mod-derm | 5,373 | 1,306 | 0 |
| mod-micro | 4,570 | 1,110 | 0 |
| mod-oct | 3,798 | 848 | 0 |
| mod-us | 8,917 | 2,074 | 0 |
| qt-ai | 13,119 | 3,329 | 0 |
| qt-dd | 44,329 | 11,058 | 0 |
| qt-lg | 1,662 | 436 | 0 |
| qt-mr | 9,173 | 2,392 | 0 |
| qt-oba | 2,794 | 704 | 0 |
| **Total** | **71,333** | **17,919** | **0** |
All Med-R1 rows map exclusively to open-access source datasets; no restricted-access rows (`image=None`) appear in this release.
**Important:** `mod-*` and `qt-*` subsets are **not** orthogonal views of the same pool. Med-R1 partitioned them independently with separate train/test boundaries. A given (image, question) pair may appear in a `mod-*` train split and a `qt-*` test split (or vice versa). Do not assume the union of all `mod-*` rows equals the union of all `qt-*` rows.
---
## Schema
| Field | Type | Description |
|-------|------|-------------|
| `image` | `Image` | Full-resolution image bytes. `None` for restricted-access rows. |
| `problem` | `string` | Med-R1 inline-options format: `"<question> A)opt, B)opt, C)opt, D)opt"` |
| `solution` | `string` | Med-R1 answer format: `"<answer> X </answer>"` (spaces inside tags) |
| `question` | `string` | Plain question text with options stripped |
| `choice_a` | `string` | Text of option A |
| `choice_b` | `string` | Text of option B |
| `choice_c` | `string` | Text of option C; `""` for binary (Yes/No) rows |
| `choice_d` | `string` | Text of option D; `""` if only 2–3 options present (~7.8% of rows) |
| `answer_letter` | `string` | Single char `"A"` / `"B"` / `"C"` / `"D"` |
| `answer_text` | `string` | Full text of the correct choice |
| `modality` | `string` | Full modality name. Fixed for `mod-*`; best-effort from foreverbeliever QA for `qt-*`; `""` if not found. |
| `question_type` | `string` | Full question-type name. Fixed for `qt-*`; best-effort from foreverbeliever QA for `mod-*`; `""` if not found. |
---
## Example Row
```json
{
"image": "<PIL.Image 480x353 RGBA>",
"problem": "What modality is used to capture this image? A)CT, B)X-ray, C)Nuclear medicine scan, D)PET scan",
"solution": "<answer> A </answer>",
"question": "What modality is used to capture this image?",
"choice_a": "CT",
"choice_b": "X-ray",
"choice_c": "Nuclear medicine scan",
"choice_d": "PET scan",
"answer_letter": "A",
"answer_text": "CT",
"modality": "CT(Computed Tomography)",
"question_type": "Other Biological Attributes"
}
```
---
## Usage
```python
from datasets import load_dataset
# Load CT modality subset — train split
ds = load_dataset("mtybilly/OmniMedVQA-V2", "mod-ct", split="train")
print(ds[0])
# Load Disease Diagnosis subset — test split
ds = load_dataset("mtybilly/OmniMedVQA-V2", "qt-dd", split="test")
# Iterate all 13 configs
configs = [
"mod-ct", "mod-mri", "mod-xray", "mod-fundus",
"mod-derm", "mod-micro", "mod-oct", "mod-us",
"qt-ai", "qt-dd", "qt-lg", "qt-mr", "qt-oba",
]
for cfg in configs:
ds = load_dataset("mtybilly/OmniMedVQA-V2", cfg)
print(cfg, ds)
# Filter out restricted-access rows (image=None)
ds_open = ds.filter(lambda x: x["image"] is not None)
```
For training with `max_pixels=401408` (project standard):
```python
from datasets import load_dataset
from PIL import Image as PILImage
def resize_if_needed(example, max_pixels=401408):
img = example["image"]
if img is None:
return example
w, h = img.size
if w * h > max_pixels:
scale = (max_pixels / (w * h)) ** 0.5
img = img.resize((int(w * scale), int(h * scale)), PILImage.LANCZOS)
example["image"] = img
return example
```
---
## Source & Derivation
This dataset is derived from three sources:
1. **OmniMedVQA** (canonical benchmark):
Hu, Yutao et al. "OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM." *NeurIPS 2024*. arXiv:2402.09181.
HF upstream: [foreverbeliever/OmniMedVQA](https://huggingface.co/datasets/foreverbeliever/OmniMedVQA)
2. **Med-R1** (train/test partitioning):
Lai, Yuxiang et al. "Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models." arXiv:2503.13939.
GitHub: [Yuxiang-Lai117/Med-R1](https://github.com/Yuxiang-Lai117/Med-R1) — `Splits/{modality,question_type}/{train,test}/*.json`
3. **This release** (mtybilly/OmniMedVQA-V2):
Restructured from the above two sources. The train/test split boundaries for each of the 13 configs are taken verbatim from Med-R1's JSON files. Image bytes are sourced from foreverbeliever/OmniMedVQA's open-access pool. The `problem`, `solution`, `modality`, and `question_type` fields use the Med-R1 text and the foreverbeliever QA JSON metadata respectively.
---
## License
**CC BY-NC 4.0** — Non-commercial use only.
This dataset inherits the license of its upstream sources:
- foreverbeliever/OmniMedVQA is released under CC BY-NC 4.0
- OmniMedVQA is built from multiple publicly available medical datasets, each with their own licenses (see the original OmniMedVQA paper and repository for per-dataset attribution)
**Non-commercial use only.** Do not use this dataset for commercial purposes.
---
## Changelog
| Version | Date | Notes |
|---------|------|-------|
| v1 | — | Original `mtybilly/OmniMedVQA` with two configs: `modality`, `question_type` |
| v2 | 2026-04-16 | Replaced `modality`/`question_type` configs with 13 granular `mod-*`/`qt-*` subsets following Med-R1's partitioning. Added `question`, `choice_a/b/c/d`, `answer_letter`, `answer_text` fields. Image bytes from foreverbeliever/OmniMedVQA open-access pool. |
---
## Known Limitations
1. **Restricted-access images:** All 88,995 rows (71,333 train + 17,662 test across mod-* subsets; 71,077 train + 17,919 test across qt-* subsets) map to open-access source datasets — no `image=None` rows in this release. (foreverbeliever/OmniMedVQA contains ~35k restricted-access QA records, but Med-R1's partitioning happened to only draw from the open-access pool.)
2. **Orthogonal label coverage:** The `modality` field in `qt-*` subsets and the `question_type` field in `mod-*` subsets are populated on a best-effort basis from foreverbeliever's QA JSONs. If a row's image path does not appear in the open-access or restricted-access QA pool, the field is `""`.
3. **Independent partition boundaries:** `mod-*` and `qt-*` subsets do not cover the same rows. A row from `mod-ct/train` may appear in `qt-mr/test` — this is by design, reflecting Med-R1's independent partitioning.
4. **Images not resized:** Images are stored at original resolution. Apply `max_pixels=401408` yourself at training/eval time (project standard).
5. **Binary and 3-option rows:** ~7.8% of rows have `choice_d=""` (3-option) and a small fraction have `choice_c=choice_d=""` (binary Yes/No questions). These are valid rows from the original OmniMedVQA benchmark.
提供机构:
mtybilly



