abhay2812/vqa-rad
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/abhay2812/vqa-rad
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc0-1.0
task_categories:
- visual-question-answering
language:
- en
tags:
- medical
- radiology
- vqa
- medical-vqa
- clinical
- chest-xray
- ct-scan
- mri
pretty_name: "VQA-RAD Full: Visual Question Answering on Radiology Images"
size_categories:
- 1K<n<10K
dataset_info:
features:
- name: qid
dtype: int64
- name: image_name
dtype: string
- name: image_organ
dtype: string
- name: question
dtype: string
- name: answer
dtype: string
- name: answer_normalized
dtype: string
- name: answer_type
dtype: string
- name: question_type_primary
dtype: string
- name: question_type_raw
dtype: string
- name: phrase_type
dtype: string
- name: evaluation
dtype: string
- name: split
dtype: string
- name: image
dtype: image
splits:
- name: train
num_bytes: 178000000
num_examples: 1794
- name: test
num_bytes: 45000000
num_examples: 450
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
---
# VQA-RAD Full: Visual Question Answering on Radiology Images
## Dataset Description
This is a **cleaned and comprehensive** version of the [VQA-RAD dataset](https://doi.org/10.17605/OSF.IO/89KPS), the first manually constructed dataset where clinicians asked naturally occurring questions about radiology images and provided reference answers. Unlike the existing [flaviagiammarino/vqa-rad](https://huggingface.co/datasets/flaviagiammarino/vqa-rad) on HuggingFace which only contains image-question-answer triplets, this version **preserves all original metadata** from the source — including question types, answer types, image organ labels, phrase types, and evaluation status — enabling fine-grained evaluation of Medical VQA systems.
- **Paper:** [A dataset of clinically generated visual questions and answers about radiology images](https://www.nature.com/articles/sdata2018251) (Scientific Data, 2018)
- **Original Source:** [Open Science Framework](https://doi.org/10.17605/OSF.IO/89KPS)
- **License:** [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/)
## Dataset Summary
| | Train | Test | Total |
|---|---|---|---|
| QA pairs | 1,794 | 450 | 2,244 |
| Unique images | 313 | 203 | 314 |
The dataset contains **2,244 question-answer pairs** (after deduplication) on **314 radiology images** sourced from [MedPix®](https://medpix.nlm.nih.gov/), an open-access database of medical images and teaching cases. Questions and answers were manually generated by 15 clinical trainees (medical students and fellows) who had completed core clinical rotations.
## Data Fields
| Field | Type | Description |
|---|---|---|
| `qid` | int | Unique question ID |
| `image` | image | The radiology image (JPEG) |
| `image_name` | string | Original filename (e.g., `synpic54610.jpg`) |
| `image_organ` | string | Body region: `HEAD`, `CHEST`, or `ABD` |
| `question` | string | The clinical question about the image |
| `answer` | string | Ground truth answer (original casing) |
| `answer_normalized` | string | Lowercase, stripped answer for evaluation |
| `answer_type` | string | `CLOSED` (yes/no) or `OPEN` (free-form) |
| `question_type_primary` | string | Primary question category (see taxonomy below) |
| `question_type_raw` | string | Original question type label (may contain multi-labels) |
| `phrase_type` | string | `freeform`, `para` (paraphrase), `test_freeform`, or `test_para` |
| `evaluation` | string | `evaluated`, `not evaluated`, or `given` |
| `split` | string | `train` or `test` |
## Question Type Taxonomy
As defined in the original paper:
| Question Type | Description | Example |
|---|---|---|
| **PRES** | Object/condition presence | *"Is there a pneumothorax present?"* |
| **POS** | Positional reasoning | *"Where is the lesion located?"* |
| **ABN** | Abnormality | *"Is there something wrong with the image?"* |
| **MODALITY** | Imaging modality | *"Is this a CT or an MRI?"* |
| **PLANE** | Image orientation | *"Is this an axial image?"* |
| **SIZE** | Size/measurement | *"Is the heart enlarged?"* |
| **ORGAN** | Organ system | *"What organ system is pictured?"* |
| **ATTRIB** | Attribute (other) | *"Is the mass well circumscribed?"* |
| **COLOR** | Signal intensity/color | *"Is the lesion more or less dense than the liver?"* |
| **COUNT** | Counting | *"How many lesions are there?"* |
| **OTHER** | Other | Catch-all category |
## Dataset Distributions
### Answer Type
| | CLOSED | OPEN |
|---|---|---|
| Train | 1,297 | 497 |
| Test | 275 | 175 |
### Image Organ
| HEAD | CHEST | ABD |
|---|---|---|
| 715 | 794 | 739 |
### Question Type (Test Free-form)
| Type | CLOSED | OPEN | Total |
|---|---|---|---|
| PRES | 82 | 29 | 111 |
| POS | 3 | 35 | 38 |
| ABN | 25 | 9 | 34 |
| SIZE | 27 | 3 | 30 |
| MODALITY | 15 | 14 | 29 |
| PLANE | 12 | 11 | 23 |
| OTHER | 9 | 11 | 20 |
| ORGAN | 2 | 8 | 10 |
| ATTRIB | 6 | 2 | 8 |
| COUNT | 2 | 1 | 3 |
| COLOR | 2 | 0 | 2 |
## Usage
```python
from datasets import load_dataset
ds = load_dataset("abhay2812/vqa-rad")
# Access a sample
sample = ds['train'][0]
print(sample['question']) # "Are regions of the brain infarcted?"
print(sample['answer']) # "Yes"
print(sample['question_type_primary']) # "PRES"
print(sample['answer_type']) # "CLOSED"
print(sample['image_organ']) # "HEAD"
# Filter by question type
pres_questions = ds['test'].filter(lambda x: x['question_type_primary'] == 'PRES')
# Filter by answer type for separate evaluation
closed = ds['test'].filter(lambda x: x['answer_type'] == 'CLOSED')
open_ended = ds['test'].filter(lambda x: x['answer_type'] == 'OPEN')
# Filter test free-form only (standard benchmark split)
test_freeform = ds['test'].filter(lambda x: x['phrase_type'] == 'test_freeform')
```
## Evaluation
Following the original paper and the [Papers with Code leaderboard](https://paperswithcode.com/dataset/vqa-rad), models are typically evaluated on three metrics:
- **Closed-ended Accuracy**: Accuracy on yes/no questions
- **Open-ended Accuracy**: Accuracy on free-form answer questions
- **Overall Accuracy**: Accuracy across all questions
The `answer_normalized` field provides lowercased answers for consistent evaluation matching.
## Cleaning Steps Applied
1. Renamed columns from uppercase Excel headers to clean lowercase names
2. Extracted image filenames from full MedPix URLs
3. Fixed `answer_type` inconsistency (trailing whitespace)
4. Handled 1 null answer (marked as `"unanswerable"`)
5. Converted numeric answers to strings (COUNT-type answers like `4`, `12`, `0.05`)
6. Added `answer_normalized` (lowercase, stripped) for evaluation
7. Fixed question type typos: `ATRIB` → `ATTRIB`, `Other` → `OTHER`, `PRSE` → `PRES`
8. Created `question_type_primary` from multi-label entries (e.g., `SIZE, PRES` → `SIZE`)
9. Removed 4 duplicate image-question-answer triplets
10. Verified all 314 images load correctly
## Citation
If you use this dataset, please cite the original paper:
```bibtex
@article{lau2018dataset,
title={A dataset of clinically generated visual questions and answers about radiology images},
author={Lau, Jason J and Gayen, Soumya and Ben Abacha, Asma and Demner-Fushman, Dina},
journal={Scientific Data},
volume={5},
number={1},
pages={1--10},
year={2018},
publisher={Nature Publishing Group},
doi={10.1038/sdata.2018.251}
}
```
## Acknowledgments
Dataset cleaned and uploaded by [abhay2812](https://huggingface.co/abhay2812). The original dataset was created by researchers at the Lister Hill National Center for Biomedical Communications, National Library of Medicine, and is archived on the [Open Science Framework](https://doi.org/10.17605/OSF.IO/89KPS).
提供机构:
abhay2812



